Basic Matrix Theory

Embed Size (px)

Citation preview

  • 7/27/2019 Basic Matrix Theory

    1/10

    1

    Basic Matrix Theory

    1. Matrix as a Linear Transformation

    Let A be an nm matrix of real numbers. Instead of defining a matrix as a rectangulararray of real numbers, we view it as a linear function from Rm to Rn.

    First, A is interpreted as a function so that we write

    y = Ax,

    where y Rn is the output, corresponding to the input x Rm, of the function

    defined by A. Throughout, Ax is defined as the usual multiplication of a vector x tothe matrix A. Note that we write as Ax the function A with its argument x, instead

    of A(x), which is more standard in the general context. If A is a square matrix, in

    which case we have n = m, the function represented by A becomes a transformation

    on Rn, i.e., a function from Rn into itself.

    Second, A represents a linear function. This follows directly from the usual rule

    for the matrix and vector multiplication, i.e., we have

    A(

    c1x1 + c2x2)

    = c1Ax1 + c2Ax2

    for any scalars c1 and c2 and vectors x1 and x2 in Rm, which implies that A is linear

    as a function.

    As for the usual linear function, we may define the range R(A) and null space (orkernel) N(A) of a function defined by a matrix A by

    R(A) = {y y = Ax for some x}

    N(A) = {x Ax = 0}.We may easily show that R(A) and N(A) are subspaces respectively ofRn and Rm,i.e., subsets of vector spaces Rn and Rm that are vector spaces themselves. If y1, y2 R(A), then there exist x1, x2 such that y1 = Ax1, y2 = Ax2, and therefore, we have

  • 7/27/2019 Basic Matrix Theory

    2/10

    2

    c1y2 + c2y2 = A(c1x1 + c2x2), which implies that c1y2 + c2y2 R(A) for any linearcombination ofy1, y2. Moreover, ifx1, x2 N(A), then we have Ax1 = Ax2 = 0, and

    therefore, A(c1x1 + c2x2) = 0, which implies that c1x2 + c2x2 N(A) for any linearcombination of x1, x2.

    It is easy to see that the range of A is the space spanned by the column vectors of

    A. Note that Ax yields a linear combination of the column vectors of A with weights

    given by the components of x. We define the rank of A, denoted by rank (A), to be

    the dimension of the range of A. On the other hand, the dimension of the null space

    of A is often referred to as the nullity of A, which is written as nullity (A). Recall

    that the dimension of a subspace of a vector space is defined to be the number oflinearly independent vectors we need to span it. It is well expected that rank (A) =

    m nullity (A), or equivalently,

    rank(A) + nullity (A) = m,

    since we have m nulity (A) linearly independent vectors whose span has emptyintersection with the null space of A except for the origin and they are mapped into

    a linearly independent set of vectors due to the linearity of A.

    We denote by A the transpose of a matrix A. Moreover, for a subspace M ofRn,

    we define

    M ={

    xxy = 0 for all y M},

    which is commonly referred to as the orthogonal complement of M and read as M-

    perp. For instance, the orthogonal complement of the xy-plane is the z-axis in R3.

    Note that we have dim M = n dim M, where dim denotes the dimension. It iseasy to deduce that

    R(A) =N(A).

    This just states that a vector x is orthogonal to every column vector of A if and only

    if Ax = 0, which is almost tautological.

  • 7/27/2019 Basic Matrix Theory

    3/10

    3

    2. Eigenvalues and Eigenvectors

    Throughout this section, we let A be an n

    n square matrix. We say that a scalar

    is an eigenvalue if there exists a vector x Rn such that

    Ax = x.

    The vector x is called an eigenvector associated with, or corresponding to, the eigen-

    value . Clearly, is an eigenvalue of A if and only if there exists x such that

    (A I)x = 0, which holds if and only if A is singular or non-invertible. Therefore,we may find eigenvalues by solving the determinantal equation

    det(A I) = 0.Once an eigenvalue is found, the corresponding eigenvectors may be obtained by

    solving the linear equations (A I)x = 0.

    The determinantal equation yields an n-th order equation

    a0n + a1

    n1 + + an1 + an = 0

    in with coefficients a0, . . . , an given by the entries of A. The equation can also be

    written after factorization as

    ( 1) ( n) = 0,

    where 1, . . . , n now become eigenvalues ofA. Eigenvalues ofA are generally complex

    numbers, even if A itself is a real matrix. The set of eigenvalues of A is called the

    spectrum of A. Of course, we may not have n-distinct roots for the determinantal

    equation, since some roots may be repeated. In general, therefore, the number of

    elements in the spectrum of A is fewer than the dimension n of A. If an eigenvalue

    i is repeated m-times as a root of the determinantal equation, then we say that it

    has algebraic multiplicity m.

    For each of eigenvalue i, we have at least one vector x such that (A iI)x = 0,which becomes an eigenvector associated with i. An eigenvector is identified only up

  • 7/27/2019 Basic Matrix Theory

    4/10

    4

    to a constant multiplication, since if xi is an eigenvector associated with eigenvalue

    i then any constant multiple of xi also becomes an eigenvector associated with the

    same eigenvalue i. Naturally, there may be multiple eigenvectors associated with asingle eigenvalue i. In this case, they are not individually identified even up to a

    constant multiplication. If, for instance, x1i and x2i are two eigenvectors associated

    with i, then any linear combination of them is also an eigenvector associated with i,

    since A(c1x1i + c2x2i) = i(c1x1i + c2x2i) for any constants c1 and c2. In fact, it is easy

    to see that eigenvectors associated with any eigenvalue i are identified only up to

    the space spanned by them, which we call the eigenspace associated with eigenvalue

    i. The null space, for instance, can be regarded as the eigenspace associated with

    zero eigenvalue. The dimension of the eigenspace associated with eigenvalue i is

    sometimes called the geometric multiplicity of eigenvalue i. It is known that the

    geometric multiplicity cannot exceed the algebraic multiplicity. For instance, if the

    eigenspace associated with eigenvalue i is 2-dimensional, then i is a root of the

    determinantal equation that is repeated at least 2-times.

    There are two commonly used functionals of a matrix, trace and determinant,

    whose values are solely determined by its eigenvalues. For a matrix A with eigenvalues

    1, . . . , n, we define trace and determinant of A as

    tr (A) =ni=1

    i and det(A) =ni=1

    i

    respectively. The trace and determinant can also be obtained directly from the entries

    ofA using the relationships between the coefficients a0, . . . , an and the roots 1, . . . , n

    of the determinantal equation. To see this more clearly, we consider a 2-dimensional

    square matrix A = (aij), i , j = 1, 2, whose determinantal equation is given by

    2 (a11 + a22) + (a11a22 a12a21) = 0,and note that we have 1 + 2 = a11 + a22 and 12 = a11a22 a12a21 if we set 1 and2 to be the roots of the determinantal equation.

  • 7/27/2019 Basic Matrix Theory

    5/10

    5

    3. Projections

    Let P be an n-dimensional square matrix, which we may view as a linear transfor-

    mation on Rn. We say that a matrix P is idempotent if and only if

    P2 = P.

    For an idempotent matrix P, we have

    P(P x) = P2x = P x

    for all x Rn, which implies that P is an identity map if restricted to the rangeR(P) ofP. Note that any vector in R(P) can be written as P x for some x Rn. Asfor any other linear transformations on Rn, we have rank (P) + nullity (P) = n, i.e.,

    dim R(P)+dimN(P) = n. Moreover, R(P)N(P) = {0}, since any nonzero vectorin R(P) is mapped to itself and in particular does not belong toN(P). Consequently,for any vector x Rn we may write x = y + z uniquely for some y R(P) andz N(P).

    For an n-dimensional idempotent matrix P, we have y = P x if x Rn is given

    by x = y + z with y R(P) and z N(P). Therefore, y can be obtained byprojecting x on R(P) along N(P). For this reason, we call the transformation givenby an idempotent matrix a projection. An idempotent matrix itself is also often

    called a projection. If an idempotent matrix P is also symmetric and P = P, then

    we have R(P) N(P), in which case the transformation given by P becomes anorthogonal projection. We also call a matrix P itself an orthogonal projection if it

    is idempotent and symmetric. In what follows, we say that P is an m-dimensional

    projection or orthogonal projection on Rn if

    R(P) is an m-dimensional subspace of

    Rn. The identity matrix is the only n-dimensional projection on Rn.

    Obviously, any projection P has only two distinct eigenvalues, 0 and 1. For all

    x N(P), we have P x = 0, and therefore, N(P) is the eigenspace associated witheigenvalue 0. On the other hand, for all x R(P), we have P x = x, and therefore,

  • 7/27/2019 Basic Matrix Theory

    6/10

    6

    R(P) is the eigenspace associated with eigenvalue 1. Let dim R(P) = m. Then thegeometric multiplicity of eigenvalue 1 is m, and 1 must be a root of determinantal

    equation repeated at least m-times. Likewise, the geometric multiplicity of eigenvalue0 is nm, and 0 must be a root of determinantal equation repeated at least (n m)-times. However, there cannot be more than n-roots to the determinantal equation,

    the algebraic multiplicities of eigenvalues of 1 and 0 are exactly m and n m. Itfollows, in particular, that tr (P) = m, where m is the dimension of projection P.

    We may easily see that if P is idempotent, so is I P. This implies that if Pis a projection, so is I P. In fact, it is clear that I P is a projection on N(P)

    along R(P). Note that R(I P) =N(P) and N(I P) = R(P) for any projectionP. Therefore, if P is an m-dimensional projection in Rn, then I P is an (n m)-dimensional projection in Rn. It follows straightforwardly that tr (I P) = n m ifP is m-dimensional. Clearly, we have P(IP) = (IP)P = 0 for any projection P.If, in particular, P is the orthogonal projection on an m-dimensional subspace M of

    Rn, then I P is the (n m)-dimensional orthogonal projection on the orthogonal

    complement M of M.

    Let A be an n m matrix A of full column rank, i.e., rank (A) = m and A hasm-linearly independent column vectors. Now we construct the orthogonal projection

    P on the range R(A) of A. We choose an arbitrary x Rn and orthogonally projectit on R(A), which we denote by P x. We may set P x = Ab for some b Rm, sinceit is in R(A). To determine b Rm, we note that A(x Ab) = 0, which yieldsb = (AA)1Ax. Consequently, we have P x = A(AA)1Ax, from which it follows

    that

    P = A(AA)1A

    since the choice ofx Rn was arbitrary. As expected, P is idempotent and symmetric.If, in particular, the column vectors of A are orthonormal, then we have P = AA,

    since AA = I.

  • 7/27/2019 Basic Matrix Theory

    7/10

    7

    4. Spectral Representation

    Throughout this section, we assume that A is an n-dimensional symmetric matrix of

    real numbers. There are three important facts known for symmetric matrices listed

    below.

    (a) All eigenvalues are real.

    (b) Eigenvectors associated with distinct eigenvalues are orthogonal.

    (c) Geometric multiplicities of all eigenvalues are identical to their algebraic

    multiplicities.

    One immediate consequence of these facts is that there are orthogonal eigenspaces

    M1, . . . , M m associated with real distinct eigenvalues 1, . . . , m of A, the sum of

    whose dimensions is exactly n. Notice that Mi Mj = {0} for all i, j = 1, . . . , m,since they are orthogonal. Of course, the number of distinct eigenvalues is generally

    smaller than n, since some roots of the determinantal equations are repeated.

    For any n-dimensional symmetric matrix A, we may therefore partition Rn into

    the eigenspaces M1, . . . , M m of A, so that we may write any x Rn uniquely asx = x1 + xm with xi Mi for i = 1, . . . , m. Intuitively, it is clear that xi = Pix, if

    we denote by Pi the orthogonal projection on Mi, for i = 1, . . . , m, since M1, . . . , M m

    are orthogonal. It follows that we have

    x = P1x + + Pmx

    for all x Rn, i.e., P1 + + Pm = I. Consequently, we may deduce that

    Ax = A

    mi=1

    Pix

    =

    mi=1

    A(Pix) =mi=1

    i(Pix) =

    mi=1

    iPi

    x

    for all x

    Rn, and we have

    A =mi=1

    iPi,

    which is called the spectral representation of A. Note that, if restricted on each

    of Mi, the transformation given by A is extremely simple and reduces to a scalar

    multiplication by i, i.e., A(Pix) = i(Pix), i = 1, . . . , m.

  • 7/27/2019 Basic Matrix Theory

    8/10

    8

    For i = 1, . . . , m, let Hi be a matrix whose column vectors consist of orthonormal

    eigenvectors associated with eigenvalue i. Then we may write Pi more explicitly

    as Pi = HiH

    i. If we further let xi1, . . . , xi be the column vectors of Hi, we havePi = HiH

    i =

    j=1 xijx

    ij . Therefore, we may write the spectral representation of A

    generally as

    A =ni=1

    i xix

    i

    with eigenvalues 1, . . . , n and their corresponding orthonormal eigenvectors x1, . . . , xn

    of A, where we allow any eigenvalue i to be repeated arbitrary times. As a conse-

    quence, we may represent A as

    A = UU

    ,

    where U is an orthogonal matrix having xi in its i-th column and is a diagonal

    matrix with i on its i-th diagonal entry. Note that U is a nonsingular matrix such

    that UU = I, and therefore, U = U1. We call such a matrix orthogonal.

    The matrix version A = UU of the spectral representation of A is extremely

    useful in many different contexts. Note that A2 = (UU)(UU) = U2U, which

    can be easily extended to An = UnU for arbitrary nonnegative integer n. More

    generally, the spectral representation ofA allows us to define a wide class of functions

    f(A) with the matrix argument A by

    f(A) = U

    f(1) . . .

    f(n)

    U.

    For instance, we may define

    A or A1/2 as above with f() =

    , as long as A 0,i.e., i 0 for all i = 1, . . . , n. Likewise, log A is defined as above with f() = log ,which of course require A > 0, i.e., i > 0 for all i = 1, . . . , n. It is also possible todefine A1 as above with f() = 1/ if A is invertible and none ofi, i = 1, . . . , n, is

    zero. Other functions ofA, such as eA, may also be defined similarly with f() = e.

    Note in particular that, for A = (aij), f(A) is in general not defined as f(A) =

    (f(aij)).

  • 7/27/2019 Basic Matrix Theory

    9/10

    9

    Above are introduced matrix inequalities A 0 and A > 0 for a symmetricmatrix A, in which case we say that A is positive semi-definite and positive definite

    respectively. It follows from

    xAx = x

    mi=1

    iPi

    x =

    mi=1

    i(

    xPix)

    that A 0 if and only if xAx 0 for all x Rn, and that A > 0 if and onlyif xAx > 0 for all x = 0 in Rn. Note that for any projection P we have xP x =

    (P x)(P x) = P x2 and P x2 0 for all x Rn and P x2 > 0 for all x = 0 inRn. Clearly, we have P 0 for any projection P. For symmetric matrices A and B

    of the same dimension, we write A B and A > B if and only if A B 0 andA B > 0 respectively.

    5. Exercises

    1. Let A and B be matrices of dimensions n m and m , respectively, and defineai to be the i-th column of A and b

    i to be the i-th row of B. Show that

    AB =m

    i=1aib

    i.

    Apply this to the case of B = b being a vector with = 1 and show that R(A)becomes the space spanned by the column vectors a1, . . . , am of A.

    2. Show that rank AB = rank A if and only if N(A) R(B) = {0} for any matricesA and B of conformable dimensions, and use this result to deduce that rank AA =

    rank A for any matrix A.

    3. Let A and B be nm matrices of full column rank such that R(A)R(B) = {0}.Show that the projection on R(A) along R(B) in Rn is given by

    P = A(BA)1B.

    Hint: Choose an arbitrary x Rn and write P x = Ab for some b Rm, and obtain bfrom the condition x Ab R(B) =N(B).

  • 7/27/2019 Basic Matrix Theory

    10/10

    10

    4. Define

    x =

    21

    and y =

    1

    1

    .

    (a) Find the orthogonal projection on the span of x.

    (b) Find the projection on the span of x along the span of y.

    5. For a matrix A defined as

    A =

    3

    21

    2

    12

    3

    2

    ,

    find A10, A , log A, A1 and eA.

    6. On matrix inequality, answer the following:

    (a) Show that A 0 implies BAB 0 for any matrix B of conformable dimension,and use this result to deduce that A B implies CAC CBC for any matrix C ofconformable dimension.

    (b) Show that A I implies A1 I, and use this result to deduce that A B > 0implies 0 < A1 B1.Hint: Note that, if A has the spectral representation A =

    mi=1 iPi, then we have

    A I = mi=1(i 1)Pi, since mi=1 Pi = I.