Functions of a Matrix: Theory, Applications and Computation

Research Matters

February 25, 2009

Nick HighamDirector of Research

School of Mathematics

1 / 6

Functions of a Matrix: Theory,Applications and Computation

Nick HighamSchool of Mathematics

The University of Manchester

[email protected]://www.ma.man.ac.uk/~higham/

http://www.ma.man.ac.uk/~higham/

http://www.ma.man.ac.uk

http://www.man.ac.uk

mailto:[email protected]

http://www.ma.man.ac.uk/~higham/

Examples History & Properties Applications Methods

Outline

1 Examples

2 History & Properties

3 Applications

4 Methods for Matrix Square Root

MIMS Nick Higham Matrix Functions 2 / 37

http://www.mims.manchester.ac.uk/


Matrix Square Root Example

Find a matrix X such that

X 2 = A =

1 1 00 1 00 0 1

.

A solution is

X =

1 1/2 00 1 00 0 1

.All square roots are given by ±X and

Y = ±U

1 1/2 00 1 00 0 −1

U−1, U =

a b d0 a 00 e c

.






X 2 = A =

1 1 00 1 00 0 1

.A solution is

X =

1 1/2 00 1 00 0 1

.

All square roots are given by ±X and

Y = ±U

1 1/2 00 1 00 0 −1

U−1, U =

a b d0 a 00 e c

.






X 2 = A =

1 1 00 1 00 0 1

.A solution is

X =

1 1/2 00 1 00 0 1

.All square roots are given by ±X and

Y = ±U

1 1/2 00 1 00 0 −1

U−1, U =

a b d0 a 00 e c

.MIMS Nick Higham Matrix Functions 3 / 37



Root Oddities (1)

B2n = In, where

B4 =

1 1 1 10 −1 −2 −30 0 1 30 0 0 −1

.Arises in BDF solvers for ODEs.

Turnbull (1927): A3n = In, where

A4 =

−1 1 −1 1−3 2 −1 0−3 1 0 0−1 0 0 0

.




Root Oddities (1)

B2n = In, where

B4 =

1 1 1 10 −1 −2 −30 0 1 30 0 0 −1

.Arises in BDF solvers for ODEs.Turnbull (1927): A3

n = In, where

A4 =

−1 1 −1 1−3 2 −1 0−3 1 0 0−1 0 0 0




Root Oddities (2)

C2n = I, where

C4 = 2−3/2

1 3 3 11 1 −1 −11 −1 −1 11 −3 3 −1

.

Hill (1932): US patent for involutory matrices incryptography.

Bauer (2002): “since then the value of mathematicalmethods in cryptology has been unchallenged.”




Root Oddities (2)

C2n = I, where

C4 = 2−3/2

1 3 3 11 1 −1 −11 −1 −1 11 −3 3 −1

.

Hill (1932): US patent for involutory matrices incryptography.

Bauer (2002): “since then the value of mathematicalmethods in cryptology has been unchallenged.”




Logarithm Example

Find a real log of A = −I2n, i.e., real solution of eX = A.

For real log, map eigenvalues in pairs{−1,−1} → {(2k + 1)πi ,−(2k + 1)πi}.

Let H =[

0 1−1 0

]. All solutions are

X = πU diag((2k1 + 1)H, (2k2 + 1)H, . . . , (2kn + 1)H

)U−1,

for real nonsingular U. Thus, e.g., k1 = 0, k2 = 1, integer U,

eX = −I4 for X = π

39 20 12 6−55 −28 −15 −10

7 3 0 4−71 −36 −24 −11

.




Logarithm Example



Let H =[

0 1−1 0


X = πU diag((2k1 + 1)H, (2k2 + 1)H, . . . , (2kn + 1)H

)U−1,



39 20 12 6−55 −28 −15 −10

7 3 0 4−71 −36 −24 −11

.




Logarithm Example



Let H =[

0 1−1 0


X = πU diag((2k1 + 1)H, (2k2 + 1)H, . . . , (2kn + 1)H

)U−1,



39 20 12 6−55 −28 −15 −10

7 3 0 4−71 −36 −24 −11




Outline

1 Examples


3 Applications





Cayley and Sylvester

Term “matrix” coined in 1850by James Joseph Sylvester,FRS (1814–1897).

Matrix algebra developed byArthur Cayley, FRS (1821–1895).Memoir on the Theory of Ma-trices (1858).




Cayley and Sylvester on Matrix Functions

Cayley considered matrix squareroots in his 1858 memoir.

Tony Crilly, Arthur Cayley: Mathemati-cian Laureate of the Victorian Age,2006.

Sylvester (1883) gave first defini-tion of f (A) for general f .

Karen Hunger Parshall, James JosephSylvester. Jewish Mathematician in aVictorian World, 2006.




Two Definitions

Definition (Cauchy integral formula)

f (A) =1

2πi

∫Γ

f (z)(zI − A)−1 dz,

where f analytic on and inside closed contour Γ enclosingλ(A).

Definition (Schwerdtfeger, 1938)For A with distinct e’vals λ1, . . . , λs with indices ni ,

f (A) =s∑

i=1

Ai

ni−1∑j=0

f (j)(λi)

j!(A− λi I)j =

s∑i=1

ni−1∑j=0

f (j)(λi)Zij ,

Ai are Frobenius covariants, Zij depend on A but not f .




Matrices in Applied Mathematics

Frazer, Duncan & Collar, Aerodynamics Division ofNPL: aircraft flutter, matrix structural analysis.

Elementary Matrices & Some Applications toDynamics and Differential Equations, 1938.Emphasizes importance of eA.

Arthur Roderick Collar, FRS(1908–1986): “First book to treatmatrices as a branch of appliedmathematics”.




Function of 2× 2 Triangular Matrix

f([

λ1 t12

0 λ2

])=

f (λ1) t12f (λ2)− f (λ1)

λ2 − λ1

0 f (λ2)

, λ1 6= λ2,

[f (λ) t12f ′(λ)

0 f (λ)

], λ1 = λ2 = λ.

(1,2) elements given by t12f [λ2, λ1] always.

Inaccurate if λ1 ≈ λ2.




Function of 2× 2 Triangular Matrix

f([

λ1 t12

0 λ2

])=

f (λ1) t12f (λ2)− f (λ1)

λ2 − λ1

0 f (λ2)

, λ1 6= λ2,

[f (λ) t12f ′(λ)

0 f (λ)

], λ1 = λ2 = λ.

(1,2) elements given by t12f [λ2, λ1] always.

Inaccurate if λ1 ≈ λ2.




Log of 2× 2 Triangular Matrix

logλ2 − logλ1 = log(λ2

λ1

)+ 2π i U(logλ2 − logλ1)

= log(

1 + z1− z

)+ 2π i U(logλ2 − logλ1),

where U(z) =⌈

Im z − π2π

⌉, z = (λ2 − λ1)/(λ2 + λ1).

atanh(z) :=12

log(

1 + z1− z

),

f12 = t122 atanh(z) + 2πiU(logλ2 − logλ1)

λ2 − λ1.




Log of 2× 2 Triangular Matrix

logλ2 − logλ1 = log(λ2

λ1

)+ 2π i U(logλ2 − logλ1)

= log(

1 + z1− z

)+ 2π i U(logλ2 − logλ1),

where U(z) =⌈

Im z − π2π

⌉, z = (λ2 − λ1)/(λ2 + λ1).

atanh(z) :=12

log(

1 + z1− z

),

f12 = t122 atanh(z) + 2πiU(logλ2 − logλ1)

λ2 − λ1.




Function of Block Triangular MatrixRecall, Fréchet derivative L:

f (X + E)− f (X )− L(X ,E) = o(‖E‖).

Theorem

f([

X E0 X

])=

[f (X ) L(X ,E)

0 f (X )

].

Application: the iteration

Xk+1 = 12(Xk + X−1

k A), X0 = A

converges to A1/2. Apply it to [ A E0 A ] and read off (1,2) block

to get iteration for Fréchet derivative.




Function of Block Triangular MatrixRecall, Fréchet derivative L:

f (X + E)− f (X )− L(X ,E) = o(‖E‖).

Theorem

f([

X E0 X

])=

[f (X ) L(X ,E)

0 f (X )

].

Application: the iteration

Xk+1 = 12(Xk + X−1

k A), X0 = A

converges to A1/2. Apply it to [ A E0 A ] and read off (1,2) block

to get iteration for Fréchet derivative.




Outline

1 Examples


3 Applications





Toolbox of Matrix Functions

d2ydt2 + Ay = 0, y(0) = y0, y ′(0) = y ′0

has solution

y(t) = cos(√

At)y0 +(√

A)−1 sin(

√At)y ′0.

But [y ′

y

]= exp

([0 −tA

t In 0

])[y ′0y0

].

In software want to be able evaluate interesting f atmatrix args as well as scalar args.MATLAB has expm, logm, sqrtm, funm.





d2ydt2 + Ay = 0, y(0) = y0, y ′(0) = y ′0

has solution

y(t) = cos(√

At)y0 +(√

A)−1 sin(

√At)y ′0.

But [y ′

y

]= exp

([0 −tA

t In 0

])[y ′0y0

].






d2ydt2 + Ay = 0, y(0) = y0, y ′(0) = y ′0

has solution

y(t) = cos(√

At)y0 +(√

A)−1 sin(

√At)y ′0.

But [y ′

y

]= exp

([0 −tA

t In 0

])[y ′0y0

].





The Average Eye

First order character of optical system characterized bytransference matrix

T =

[S δ0 1

]∈ R5×5,

where S ∈ R4×4 is symplectic:

ST JS = J =

[0 I2−I2 0

].

Average m−1∑mi=1 Ti is not a transference matrix.

Harris (2005) proposes the average exp(m−1∑mi=1 log(Ti)).




Markov Models

Let P be transition probability matrix for discrete-timeMarkov process.If P is transition matrix for 1 year,P(1/12) = P1/12 = e

112 log P is matrix for 1 month.

Problem: log P, P1/k may have wrong sign patterns⇒“regularize”.In credit risk, P is strictly diagonally dominant.




Email from a Power Company

The problem has arisen through proposedmethodology on which the company will incurcharges for use of an electricity network....

I have the use of a computer and Microsoft Excel....

I have an Excel spreadsheet containing thetransition matrix of how a company’s [Standard &Poor’s] credit rating changes from one year to thenext. I’d like to be working in eighths of a year, sothe aim is to find the eighth root of the matrix.




HIV to Aids Transition

Estimated 6-month transition matrix.Four AIDS-free states and 1 AIDS state.2077 observations (Charitos et al., 2008).

P =

0.8149 0.0738 0.0586 0.0407 0.01200.5622 0.1752 0.1314 0.1169 0.01430.3606 0.1860 0.1521 0.2198 0.08150.1676 0.0636 0.1444 0.4652 0.1592

0 0 0 0 1

.Want to estimate the 1-month transition matrix.

Λ(P) = {1,0.9644,0.4980,0.1493,−0.0043}.

N. J. Higham and L. Lin.On pth roots of stochastic matrices, LAA, 2011.




Phi Functions: Definition

ϕ0(z) = ez , ϕ1(z) =ez − 1

z, ϕ2(z) =

ez − 1− zz2 , . . .

ϕk+1(z) =ϕk(z)− 1/k !

z.

ϕk(z) =∞∑

j=0

z j

(j + k)!.




Phi Functions: Solving DEs

y ∈ Cn, A ∈ Cn×n.

dydt

= Ay , y(0) = y0 ⇒ y(t) = eAty0.

dydt

= Ay + b, y(0) = 0 ⇒ y(t) = t ϕ1(tA)b.

dydt

= Ay + ct , y(0) = 0 ⇒ y(t) = t2ϕ2(tA)c.

...






dydt

= Ay , y(0) = y0 ⇒ y(t) = eAty0.

dydt

= Ay + b, y(0) = 0 ⇒ y(t) = t ϕ1(tA)b.

dydt

= Ay + ct , y(0) = 0 ⇒ y(t) = t2ϕ2(tA)c.

...






dydt

= Ay , y(0) = y0 ⇒ y(t) = eAty0.

dydt

= Ay + b, y(0) = 0 ⇒ y(t) = t ϕ1(tA)b.

dydt

= Ay + ct , y(0) = 0 ⇒ y(t) = t2ϕ2(tA)c.

...




Exponential Integrators

Considery ′ = Ly + N(y).

N(y(t)) ≈ N(y(0)) implies

y(t) ≈ etLy0 + tϕ1(tL)N(y(0)).

Exponential Euler method:

yn+1 = ehLyn + hϕ1(hL)N(yn).

Lawson (1967); recent resurgence.




Implementation of Exponential Integrators

u ′(t) = Au(t) + g(t ,u(t)), u(0) = u0, t ≥ 0.

Let uk = g(k−1)(t ,u(t)) |t=0 and ϕ`(z) =∑∞

k=0 zk/(k + `)!.We need to compute

u(t) = etAu0 +∑p

k=1 ϕk(tA)tk uk .




Evaluating Sum of Phi Functions

Theorem (Al-Mohy & H, 2010)

Let A ∈ Cn×n, U = [u1,u2, . . . ,up] ∈ Cn×p, τ ∈ C, and define

B =

[A U0 J

]∈ C(n+p)×(n+p), J =

[0 Ip−1

0 0

]∈ Cp×p.

Then for X = eτB we have

X (1 : n,n + j) =∑j

k=1 τk ϕk(τA)uj−k+1, j = 1 : p.

u(t) =[

In 0]

exp(

t[

A U0 J

])[u0

ep

].




Evaluating Sum of Phi Functions

Theorem (Al-Mohy & H, 2010)

Let A ∈ Cn×n, U = [u1,u2, . . . ,up] ∈ Cn×p, τ ∈ C, and define

B =

[A U0 J

]∈ C(n+p)×(n+p), J =

[0 Ip−1

0 0

]∈ Cp×p.

Then for X = eτB we have

X (1 : n,n + j) =∑j

k=1 τk ϕk(τA)uj−k+1, j = 1 : p.

u(t) =[

In 0]

exp(

t[

A U0 J

])[u0

ep

].




Outline

1 Examples


3 Applications





Matrix Square Root

X is a square root of A ∈ Cn×n ⇐⇒ X 2 = A .Number of square roots may be zero, finite or infinite.

DefinitionFor A with no eigenvalues on R− = {x ∈ R : x ≤ 0} theprincipal square root A1/2 is unique square root X withspectrum in open right half-plane.




Newton’s Method for Square Root

Apply Newton to F (X ) = X 2 − A = 0: X0 given,

Solve XkEk + EkXk = A− X 2k

Xk+1 = Xk + Ek

}k = 0,1,2, . . .

Modified Newton iteration: freeze Fréchet derivative at X0:

Solve X0Ek + EkX0 = A− X 2k

Xk+1 = Xk + Ek

}k = 0,1,2, . . . ,

X0 diagonal⇒ cheap to solve for Ek .




Newton’s Method for Square Root

Apply Newton to F (X ) = X 2 − A = 0: X0 given,

Solve XkEk + EkXk = A− X 2k

Xk+1 = Xk + Ek

}k = 0,1,2, . . .

Modified Newton iteration: freeze Fréchet derivative at X0:

Solve X0Ek + EkX0 = A− X 2k

Xk+1 = Xk + Ek

}k = 0,1,2, . . . ,

X0 diagonal⇒ cheap to solve for Ek .




Pulay Iteration

Let A1/2 = D1/2 + B, D = diag(di) > 0. Squaring gives

D1/2B + BD1/2 = A− D − B2.

Functional iteration gives

Pulay iteration (1966)

D1/2Bk+1 + Bk+1D1/2 = A− D − B2k , B0 = 0.

Can show Pulay ≡ modified Newton with X0 = D1/2:Xk ≡ D1/2 + Bk .




Convergence of Pulay

“Although no proof of convergence will be given, theprocedure converged rapidly in all cases examined by us”.

Theorem (H, 2008)

Let A ∈ Cn×n with Λ(A) ∩ R− = ∅ and let D = diag(di) > 0and B = A1/2 − D1/2. If

θ =‖B‖

mini d1/2i

<23

then in the Pulay iteration Bk → A1/2 − D1/2 linearly.




Visser Iteration

Set X0 = (2α)−1I in modified Newton:

Visser iteration (1937)

Xk+1 = Xk + α(A− X 2k ), X0 = (2α)−1I.

Stationary iteration.Richardson iteration.Linear convergence.Choice of α?




Visser History

Xk+1 = Xk + α(A− X 2k ), X0 = (2α)−1I.

Visser (1937), α = 1/2: show positive operator onHilbert space has a positive square root.Likewise in functional analysis texts, e.g. Riesz &Sz.-Nagy (1956).Enables proof of existence of A1/2 without usingspectral theorem.Used computationally by Liebl (1965), Babuška,Práger & Vitásek (1966), Späth (1966), Duke (1969),Elsner (1970).Elsner proves cgce for A ∈ Cn×n with real, positiveei’vals if 0 < α ≤ ρ(A)−1/2.




Visser Transformations

Xk+1 = Xk + α(A− X 2k ), X0 = (2α)−1I.

Let θ = 1/(2α), Xk = θYk , and A = θ−2A. Then

Yk+1 = Yk +12(A− Y 2

k ), Y0 = I.

With A ≡ I − C and Yk = I − Pk :

Pk+1 =12(C + P2

k ), P0 = 0.

Qk = Pk/2:

Qk+1 = Q2k +

C4, Q0 = 0.




Visser ConvergenceXk+1 = Xk + α(A− X 2

k ), X0 = (2α)−1I.

Theorem (H, 2008)

Let A ∈ Cn×n and α > 0. If Λ(I − 4α2A) lies in the cardioid

D = {2z − z2 : z ∈ C, |z| < 1 }

then A1/2 exists and Xk → A1/2 linearly.




Example

A ∈ R16×16 spd with aii = i2, aij = 0.1, i 6= j .Aim for rel residual < nu in IEEE DP arithmetic.

Pulay iteration D = diag(A): θ = 0.191, 9 iters.Visser iteration α = 0.058 (hand optimized), 245 iters.




Future Directions

Many applications of f (A), e.g. control theory, computergraphics, theoretical physics.Better understanding of conditioning of f (A).Understanding non-primary functions.Exploit structure, e.g. A ∈ matrix automorphism groupor Jordan or Lie algebra.f (A)b problem.

Al-Mohy & H: Computing the Action of theMatrix Exponential, with an Application toExponential Integrators, SISC, 2011.




References I

A. H. Al-Mohy and N. J. Higham.Computing the action of the matrix exponential, with anapplication to exponential integrators.SIAM J. Sci. Comput., 33(2):488–511, 2011.

F. L. Bauer.Decrypted Secrets: Methods and Maxims ofCryptology.Springer-Verlag, Berlin, third edition, 2002.ISBN 3-540-42674-4.xii+474 pp.




References II

G. Boyd, C. A. Micchelli, G. Strang, and D.-X. Zhou.Binomial matrices.Adv. in Comput. Math., 14:379–391, 2001.

T. Charitos, P. R. de Waal, and L. C. van der Gaag.Computing short-interval transition matrices of adiscrete-time Markov chain from partially observeddata.Statistics in Medicine, 27:905–921, 2008.




References III

T. Crilly.Arthur Cayley: Mathematician Laureate of the VictorianAge.Johns Hopkins University Press, Baltimore, MD, USA,2006.ISBN 0-8018-8011-4.xxi+610 pp.

R. A. Frazer, W. J. Duncan, and A. R. Collar.Elementary Matrices and Some Applications toDynamics and Differential Equations.Cambridge University Press, Cambridge, UK, 1938.xviii+416 pp.1963 printing.




References IV

W. F. Harris.The average eye.Opthal. Physiol. Opt., 24:580–585, 2005.

N. J. Higham.The Matrix Function Toolbox.http://www.ma.man.ac.uk/~higham/mftoolbox.




References V

N. J. Higham.Functions of Matrices: Theory and Computation.Society for Industrial and Applied Mathematics,Philadelphia, PA, USA, 2008.ISBN 978-0-898716-46-7.xx+425 pp.

N. J. Higham and L. Lin.On pth roots of stochastic matrices.Linear Algebra Appl., 435(3):448–463, 2011.




References VI

J. D. Lawson.Generalized Runge-Kutta processes for stable systemswith large Lipschitz constants.SIAM J. Numer. Anal., 4(3):372–380, Sept. 1967.

K. H. Parshall.James Joseph Sylvester. Jewish Mathematician in aVictorian World.Johns Hopkins University Press, Baltimore, MD, USA,2006.ISBN 0-8018-8291-5.xiii+461 pp.




References VII

P. Pulay.An iterative method for the determination of the squareroot of a positive definite matrix.Z. Angew. Math. Mech., 46:151, 1966.

H. W. Turnbull.The matrix square and cube roots of unity.J. London Math. Soc., 2(8):242–244, 1927.

C. Visser.Note on linear operators.Proc. Kon. Akad. Wet. Amsterdam, 40(3):270–272,1937.



Documents

Functions of a Matrix: Theory, Applications and Computation