Matrix Norms - uio.no · Consider (4). kek/kbk is a measure for the size of the perturbation e relative to the size of b. ky −xk/kxk can in the worst case be K(A) = kAkkA−1k times

Matrix NormsTom Lyche

University of Oslo

Norway

Matrix Norms – p. 1/27

http://www.ifi.uio.no/

Matrix NormsWe consider matrix norms on (Cm,n, C). All results holds for(Rm,n, R).Definition 1 (Matrix Norms). A function ‖·‖ : C

m,n → C is called amatrix norm on C

m,n if for all A,B ∈ Cm,n and all α ∈ C

1. ‖A‖ ≥ 0 with equality if and only if A = 0. (positivity)

2. ‖αA‖ = |α| ‖A‖. (homogeneity)

3. ‖A + B‖ ≤ ‖A‖ + ‖B‖. (subadditivity)

A matrix norm is simply a vector norm on the finitedimensional vector spaces (Cm,n, C) of m × n matrices.


Equivalent normsAdapting some general results on vector norms to matrixnorms giveTheorem 2. x

1. All matrix norms are equivalent. Thus, if ‖·‖ and ‖·‖′ are two matrixnorms on C

m,n then there are positive constants µ and M suchthat µ‖A‖ ≤ ‖A‖′ ≤ M‖A‖ holds for all A ∈ C

m,n.

2. A matrix norm is a continuous function ‖·‖ : Cm,n → R.


SubmultiplicativityFor matrix norms we usually require that the norm of aproduct is bounded by the product of the norms. Thusfor square matrices A,B ∈ C

n,n and a matrix norm wemost often have the additional property

4. ‖AB‖ ≤ ‖A‖‖B‖ (submultiplicativity).

For a square matrix A and a submultiplicative matrixnorm ‖·‖ we have

‖Ak‖ ≤ ‖A‖k for k ∈ N. (1)


Consistent Matrix normsWhen m and n vary we have a family of norms which areformally different for each m and n since they are defined indifferent spaces. However, the most common matrix normsare defined by the same formula for all m,n and weconsider mainly such norms.Definition 3 (Consistent Matrix Norms). A submultiplicative matrix normwhich is defined for all m,n ∈ N, is said to be a consistent matrix norm.


The Frobenius Matrix NormFor A ∈ C

m,n we define the Frobenius norm by

‖A‖F :=(

m∑

i=1

n∑

j=1

|aij|2)1/2

.

‖A‖F =√

σ21 + · · · + σ2

n (singular values of A.)

The Frobenius norm is a consistent matrix norm whichis subordinate to the Euclidian vector norm.


Subordinate Matrix NormA matrix norm ‖ ‖ on C

m,n is subordinate to the vectornorms ‖ ‖α on C

n and ‖ ‖β on Cm if

‖Ax‖β ≤ ‖A‖‖x‖α for all A ∈ Cm,n and x ∈ C

n.


Operator NormDefinition 4. Suppose m,n ∈ N are given and let ‖·‖α be a vectornorm on C

n and ‖·‖β a vector norm on Cm. For A ∈ C

m,n we define

‖A‖ := ‖A‖α,β := maxx 6=0

‖Ax‖β

‖x‖α. (2)

We call this the (α, β) operator norm , the (α, β)-norm, or simply theα-norm if α = β.


Operator norm propertiesThe operator norm has the following properties:

It is a matrix norm

It is subordinate to the vector norms ‖·‖α and ‖·‖β .

It is consistent if the vector norms ‖·‖α = ‖·‖β and theyare defined for all m,n.

There is some x∗ ∈ Cn with ‖x∗‖α = 1 such that

‖A‖ = max‖x‖α=1

‖Ax‖β = ‖Ax∗‖β .


The p matrix normThe operator norms ‖·‖p defined from the p-vectornorms are of special interest.

We define

‖A‖p := maxx 6=0

‖Ax‖p

‖x‖p= max

‖y‖p=1‖Ay‖p. (3)

The p-norms are consistent matrix norms which aresubordinate to the p-vector norm.


Explicit expressionsFor A ∈ C

m,n we have:

‖A‖1 = max1≤j≤n∑m

k=1|ak,j |

‖A‖2 = σ1, the largest singular value of A

‖A‖∞ = max1≤k≤m∑m

j=1|ak,j |

If A ∈ Cn,n is nonsingular then ‖A−1‖2 = 1

σn

, thesmallest singular value of A.

Proof:


Unitary TransformationsAn important property of the 2-norm is that it is invariantwith respect to unitary transformations.

Let k,m, n ∈ N, V ∈ Ck,m, U ∈ C

n,n, A ∈ Cm,n, V HV = I

and UHU = I. Then

1. ‖V A‖2 = ‖A‖2 and ‖V ‖2 = 1,2. ‖AU‖2 = ‖A‖2.

Proof:


Example

A := [ 1 23 4 ]

‖A‖1 = 6

‖A‖2 = 5.465

‖A‖∞ = 7.

‖A‖F = 5.4772


Perturbation of linear systemsConsider the system of two linear equations

x1 +x2 = 20

x1 +0.999x2 = 19.99

The exact solution is x1 = x2 = 10.

Suppose we replace the second equation by

x1 + 1.001x2 = 19.99,

the exact solution changes to x1 = 30, x2 = −10.

A small change in one of the coefficients, from 0.999 to1.001, changed the exact solution by a large amount.


Ill ConditioningA mathematical problem in which the solution is verysensitive to changes in the data is called ill-conditionedor sometimes ill-posed .

Such problems are difficult to solve on a computer.

If at all possible, the mathematical model should bechanged to obtain a more well-conditioned orproperly-posed problem.


PerturbationsWe consider what effect a small change (perturbation)in the data A,b has on the solution x of a linear systemAx = b.

Suppose y solves (A + E)y = b+e where E is a (small)n × n matrix and e a (small) vector.

How large can y−x be?

To measure this we use vector and matrix norms.


Conditions on the norms‖·‖ will denote a vector norm on C

n and also asubmultiplicative matrix norm on C

n,n which in additionis subordinate to the vector norm.

Thus for any A,B ∈ Cn,n and any x ∈ C

n we have

‖AB‖ ≤ ‖A‖ ‖B‖ and ‖Ax‖ ≤ ‖A‖ ‖x‖.

This is satisfied if the matrix norm is the operator normcorresponding to the given vector norm or theFrobenius norm.


Absolute and relative errorThe difference ‖y − x‖ measures the absolute error in y

as an approximation to x,

‖y − x‖/‖x‖ or ‖y − x‖/‖y‖ is a measure for therelative error.


Perturbation in the right hand sideTheorem 5. Suppose A ∈ C

n,n is invertible, b,e ∈ Cn, b 6= 0 and

Ax = b, Ay = b+e. Then

1

K(A)

‖e‖

‖b‖≤

‖y − x‖

‖x‖≤ K(A)

‖e‖

‖b‖, K(A) = ‖A‖‖A−1‖. (4)

Proof:

Consider (4). ‖e‖/‖b‖ is a measure for the size of theperturbation e relative to the size of b. ‖y − x‖/‖x‖ canin the worst case be

K(A) = ‖A‖‖A−1‖

times as large as ‖e‖/‖b‖.


Condition numberK(A) is called the condition number with respect toinversion of a matrix , or just the condition number, if it isclear from the context that we are talking about solvinglinear systems.

The condition number depends on the matrix A and onthe norm used. If K(A) is large, A is called ill-conditioned(with respect to inversion).

If K(A) is small, A is called well-conditioned (with respectto inversion).


Condition number properties

Since ‖A‖‖A−1‖ ≥ ‖AA−1‖ = ‖I‖ ≥ 1 we always haveK(A) ≥ 1.

Since all matrix norms are equivalent, the dependenceof K(A) on the norm chosen is less important than thedependence on A.

Usually one chooses the spectral norm whendiscussing properties of the condition number, and thel1 and l∞ norm when one wishes to compute it orestimate it.


The 2-normSuppose A has singular values σ1 ≥ σ2 ≥ · · · ≥ σn > 0and eigenvalues |λ1| ≥ |λ2| ≥ · · · ≥ |λn| if A is square.

K2(A) = ‖A‖2‖A−1‖2 = σ1

σn

K2(A) = ‖A‖2‖A−1‖2 = |λ1|

|λn|, A normal.

It follows that A is ill-conditioned with respect toinversion if and only if σ1/σn is large, or |λ1|/|λn| is largewhen A is normal.

K2(A) = ‖A‖2‖A−1‖2 = λ1

λn

, A positive definite.


The residualSuppose we have computed an approximate solution y toAx = b. The vector r(y :) = Ay − b is called the residualvector , or just the residual. We can bound x−y in term ofr(y).Theorem 6. Suppose A ∈ C

n,n, b ∈ Cn, A is nonsingular and b 6= 0.

Let r(y) = Ay − b for each y ∈ Cn. If Ax = b then

1

K(A)

‖r(y)‖

‖b‖≤

‖y − x‖

‖x‖≤ K(A)

‖r(y)‖

‖b‖. (5)


DiscussionIf A is well-conditioned, (5) says that‖y − x‖/‖x‖ ≈ ‖r(y)‖/‖b‖.

In other words, the accuracy in y is about the sameorder of magnitude as the residual as long as ‖b‖ ≈ 1.

If A is ill-conditioned, anything can happen.

The solution can be inaccurate even if the residual issmall

We can have an accurate solution even if the residual islarge.


Perturbation in A

We consider next a perturbation in A.Theorem 7. Suppose A,E ∈ C

n,n, b ∈ Cn with A invertible and

b 6= 0. If ‖A−1E‖ < 1 for some operator norm then A+E isinvertible. If Ax = b and (A + E)y = b then

‖y − x‖

‖x‖≤

‖A−1E‖

1 − ‖A−1E‖≤

K(A)

1 − ‖A−1E‖

‖E‖

‖A‖. (6)

‖E‖/‖A‖ is a measure of the size of the perturbation E

in A relative to the size of A.

The condition number again plays a crucial role.


The Spectral RadiusWe define the spectral radius of a matrix A ∈ C

n,n as themaximum absolute values of the eigenvalues.

ρ(A) = maxλ∈σ(A)

|λ|. (7)

For any submultiplicative matrix norm ‖·‖ on Cn,n and

any A ∈ Cn,n we have ρ(A) ≤ ‖A‖.

Proof:

Let A ∈ Cn,n and ǫ > 0 be given. There is a

submultiplicative matrix norm ‖·‖′ on Cn,n such that

ρ(A) ≤ ‖A‖′ ≤ ρ(A) + ǫ.

Proof:


LimitsFor any A ∈ C

n,n we have

limk→∞

Ak = 0 ⇐⇒ ρ(A) < 1.

Convergence can be slow:

A =

0.99 1 0

0 0.99 1

0 0 0.99

, A100 =

0.4 9.37 1849

0 0.4 37

0 0 0.4

,

A2000 =

10−9 ǫ 0.004

0 10−9 ǫ

0 0 10−9


Documents

Matrix Norms - uio.no · Consider (4). kek/kbk is a measure for the size of the perturbation e relative to the size of b. ky −xk/kxk can in the worst case be K(A) = kAkkA−1k times