Lecture15 - Face Recognition2

Embed Size (px)

Citation preview

  • 8/2/2019 Lecture15 - Face Recognition2

    1/101

    PCAin

    Face Recognition

    Berrin Yanikoglu

    FENS

    Computer Science & EngineeringSabanc University

    Some slides from Derek Hoiem, Lana Lazebnik, Silvio Savarese, Fei-Fei Li

  • 8/2/2019 Lecture15 - Face Recognition2

    2/101

    Overview

    Face recognition, verification, tracking

    Feature subspaces: PCA and FLD Results from vendor tests

    Interesting findings about human face recognition

  • 8/2/2019 Lecture15 - Face Recognition2

    3/101

    Face detection and recognition

    Detection Recognition Sally

  • 8/2/2019 Lecture15 - Face Recognition2

    4/101

    Applications of Face Recognition

    Surveillance

    Digital photography

    Album organization

  • 8/2/2019 Lecture15 - Face Recognition2

    5/101

    Consumer application: iPhoto 2009

    Can be trained to recognize pets!

    http://www.maclife.com/article/news/iphotos_faces_recognizes_cats

    http://www.maclife.com/article/news/iphotos_faces_recognizes_catshttp://www.maclife.com/article/news/iphotos_faces_recognizes_cats
  • 8/2/2019 Lecture15 - Face Recognition2

    6/101

    Consumer application: iPhoto 2009

  • 8/2/2019 Lecture15 - Face Recognition2

    7/101

    Error measure

    Face Detection/Verification

    False Positives (%)

    False Negatives (%)

    Face Recognition

    Top-N rates (%) Open/closed set problems

    Sources of variations:

    Withglasses

    Withoutglasses

    3 Lightingconditions 5 expressions

  • 8/2/2019 Lecture15 - Face Recognition2

    8/101

    Starting idea of eigenfaces

    1. Treat pixels as a vector

    2. Recognize face by nearest neighbor

    x

    nyy ...

    1

    xy T

    kk

    min

  • 8/2/2019 Lecture15 - Face Recognition2

    9/101

    The space of face images

    When viewed as vectors of pixel values, face images

    are extremely high-dimensional 100x100 image = 10,000 dimensions

    Large memory and computational requirements

    But very few 10,000-dimensional vectors are valid

    face images We want to reduce dimensionality and effectively

    model the subspace of face images

  • 8/2/2019 Lecture15 - Face Recognition2

    10/101

    Principal Component Analysis (PCA)

    Kz

    z

    z

    2

    1

    z

    Pattern recognition in high-dimensional spaces

    Problems arise when performing recognition in a high-dimensional space(curse of dimensionality).

    Significant improvements can be achieved by first mapping the data into alower-dimensional sub-space.

    where K

  • 8/2/2019 Lecture15 - Face Recognition2

    11/101

    Change of basis

    x1

    x2

    z1

    z2

    3

    3p

    3

    3

    1

    10

    1

    13

    3

    3

    1

    03

    0

    13

    p

    p

  • 8/2/2019 Lecture15 - Face Recognition2

    12/101

    Dimensionality reduction

    1

    1

    1

    121 zzq

    1

    1

    1

    1

    1

    1

    1

    21

    b

    bb

    qq

    qqq

    x1

    x2

    z1z2

    p

    q

    |||| qq Error:

  • 8/2/2019 Lecture15 - Face Recognition2

    13/101

    Principal Component Analysis (PCA)

    KNKK

    N

    N

    uuu

    uuu

    uuu

    21

    22221

    11211

    W

    NKNKKK

    NN

    NN

    xuxuxuz

    xuxuxuz

    xuxuxuz

    ...

    ...

    ...

    ...

    2211

    22221212

    12121111

    Wxz

    PCA allows us to compute a linear transformation that maps data from a

    high dimensional space to a lower dimensional sub-space.

    In short,

    where

    Nx

    x

    x

    2

    1

    x

  • 8/2/2019 Lecture15 - Face Recognition2

    14/101

    Principal Component Analysis (PCA)

    K21 uuux Kzzz 21

    N1 v,v ,

    Lower dimensionality basis

    Approximate vectors by finding a basis in an appropriate lower dimensionalspace.

    (1) Higher-dimensional space representation:

    (2) Lower-dimensional space representation:

    NNxxx vvvx 21

    21

    are the basis vectors of the N-dimensional space

    are the basis vectors of the N-dimensional spaceK1 u,,u

    Note: If N=K, then xx

  • 8/2/2019 Lecture15 - Face Recognition2

    15/101

    Dimensionality reduction

    1

    1

    1

    121 zzq

    1

    1

    1

    1

    1

    1

    1

    21

    b

    bb

    qq

    qq

    x1

    x2

    z1z2

    p

    q

    q

  • 8/2/2019 Lecture15 - Face Recognition2

    16/101

    Illustration for projection, variance and bases

    x1

    x2

    z1

    z2

  • 8/2/2019 Lecture15 - Face Recognition2

    17/101

    Principal Component Analysis (PCA)

    Dimensionality reduction implies information loss !!

    Want to preserve as much information as possible, that is:

    How to determine the best lower dimensional sub-space?

  • 8/2/2019 Lecture15 - Face Recognition2

    18/101

    Principal Components Analysis (PCA)

    The projection of xon the direction of uis:z= uTx

    Find usuch that Var(z) is maximized:

    Var(z) = Var(uTx)

    = E[ (uTx-uTm) (uTx-uTm)T]= E[(u

    T

    x

    u

    T

    )

    2

    ] //since(uTx-uTm) is a scalar

    = E[(uTx uT)(uTx uT)]

    = E[uT(x)(x)Tu]

    = uT E[(x)(x)T]u

    = uTu

    where = E[(x)(x)T] (covariance of x)

  • 8/2/2019 Lecture15 - Face Recognition2

    19/101

    Principal Component Analysis

    Direction that maximizes the variance of the projected data:

    Projection of data point

    Covariance matrix of data

    N

    N

    1/N

    Maximize

    subject to ||u||=1

  • 8/2/2019 Lecture15 - Face Recognition2

    20/101

    Maximize Var(z) = uTusubject to ||u||=1

    Taking the derivative w.r.t w1,and setting it equal to 0, we get:

    u1= u1 u1 is an eigenvector of Choose the eigenvector with the largest eigenvalue for Var(z) to be

    maximum

    Second principal component: Max Var(z2), s.t., ||u2||=1 and it isorthogonal to u1

    Similar analysis shows that, u2= u2 u2 is another eigenvector of and so on.

    )1max 11111

    uuuuuTT

    ) )01max 1222222

    uuuuuuu

    TTT

    , :Lagrange multipliers

  • 8/2/2019 Lecture15 - Face Recognition2

    21/101

    Maximize var(z)=

    Consider the eigenvectors of for which u lu where u is an eigenvector of andl is the corresponding eigenvalue. Multiplying by uT:

    uTu = uT lu = l uT u = l for ||u||=1. Choose the eigenvector with the largest eigenvalue.

  • 8/2/2019 Lecture15 - Face Recognition2

    22/101

    Principal Component Analysis (PCA)

    Given: N data points x1, ,xNin Rd

    We want to find a new set of features that arelinear combinations of original ones:

    u(xi) = uT(x

    i)

    (: mean of data points)

    Choose unit vector u in Rd that captures the

    most data variance

    Forsyth & Ponce, Sec. 22.3.1, 22.3.2

  • 8/2/2019 Lecture15 - Face Recognition2

    23/101

    What PCA does

    The transformation z= WT(xm)

    where the columns of W are the eigenvectors of ,

    m is sample mean,centers the data at the origin and rotates the axes

  • 8/2/2019 Lecture15 - Face Recognition2

    24/101

    The covariance matrix is symmetrical and it can always bediagonalized as:

    TWW

    where is the column matrix consisting of

    the eigenvectors of,

    WT=W-1

    is the diagonal matrix whose elementsare the eigenvalues of.

    ],...,,[ 21 luuuW

    Eigenvalues of the covariance matrix

  • 8/2/2019 Lecture15 - Face Recognition2

    25/101

    Principal Component Analysis (PCA)

    Methodology

    Suppose x1, x2, ..., xMare Nx 1 vectors

  • 8/2/2019 Lecture15 - Face Recognition2

    26/101

    Principal Component Analysis (PCA)

    Methodology cont.

  • 8/2/2019 Lecture15 - Face Recognition2

    27/101

    Principal Component Analysis (PCA)

    Linear transformation implied by PCA

    The linear transformation RNRKthat performs the dimensionalityreduction is:

  • 8/2/2019 Lecture15 - Face Recognition2

    28/101

    Principal Component Analysis (PCA)

    How to choose the principal components?

    To choose K, use the following criterion:

  • 8/2/2019 Lecture15 - Face Recognition2

    29/101

    How to choose k ?

    Proportion of Variance (PoV) explained

    when iare sorted in descending order

    Typically, stop at PoV>0.9

    Scree graph plots of PoV vs k, stop at elbow

    dk

    k

    llll

    lll

    21

    21

  • 8/2/2019 Lecture15 - Face Recognition2

    30/101

  • 8/2/2019 Lecture15 - Face Recognition2

    31/101

  • 8/2/2019 Lecture15 - Face Recognition2

    32/101

    Principal Component Analysis (PCA)

    What is the error due to dimensionality reduction?

    We saw above that an original vector xcan be reconstructed using itsprincipal components:

    It can be shown that the low-dimensional basis based on principalcomponents minimizes the reconstruction error:

    It can be shown that the error is equal to:

  • 8/2/2019 Lecture15 - Face Recognition2

    33/101

    Principal Component Analysis (PCA)

    Standardization

    The principal components are dependent on the unitsused to measure theoriginal variables as well as on the rangeof values they assume.

    We should always standardize the data prior to using PCA.

    A common standardization method is to transform all the data to have zeromean and unit standard deviation:

  • 8/2/2019 Lecture15 - Face Recognition2

    34/101

    Eigenface Implementation

  • 8/2/2019 Lecture15 - Face Recognition2

    35/101

    Computation of the eigenfaces

  • 8/2/2019 Lecture15 - Face Recognition2

    36/101

    Computation of the eigenfaces cont.

  • 8/2/2019 Lecture15 - Face Recognition2

    37/101

    Computation of the eigenfaces cont.

  • 8/2/2019 Lecture15 - Face Recognition2

    38/101

    Computation of the eigenfaces cont.

  • 8/2/2019 Lecture15 - Face Recognition2

    39/101

    Face Recognition Using Eigenfaces

    f

  • 8/2/2019 Lecture15 - Face Recognition2

    40/101

    Face Recognition Using Eigenfaces cont.

    The distance er

    can be computed using the common Euclideandistance, however, it has been reported that the Mahalanobis distanceperforms better:

    Ei f l

  • 8/2/2019 Lecture15 - Face Recognition2

    41/101

    Eigenfaces example

    Training images

    x1,,xN

    Ei f l

  • 8/2/2019 Lecture15 - Face Recognition2

    42/101

    Eigenfaces example

    Top eigenvectors: u1,uk

    Mean:

    Vi li ti f i f

  • 8/2/2019 Lecture15 - Face Recognition2

    43/101

    Visualization of eigenfaces

    Principal component (eigenvector) uk

    + 3kuk

    3kuk

    R t ti d t ti

  • 8/2/2019 Lecture15 - Face Recognition2

    44/101

    Representation and reconstruction

    Face xin face space coordinates:

    =

    R t ti d t ti

  • 8/2/2019 Lecture15 - Face Recognition2

    45/101

    Representation and reconstruction

    Face xin face space coordinates:

    Reconstruction:

    +

    + w1u1 + w2u2 + w3u3 + w4u4 +

    =

    x

    R t ti

  • 8/2/2019 Lecture15 - Face Recognition2

    46/101

    P = 4

    P = 200

    P = 400

    Reconstruction

    Eigenfaces are computed using the 400 face images from ORLface database. The size of each image is 92x112 pixels (x has ~10Kdimension).

    R iti ith i f

  • 8/2/2019 Lecture15 - Face Recognition2

    47/101

    Recognition with eigenfaces

    Process labeled training images

    Find mean and covariance matrix Find k principal components (eigenvectors of ) u1,uk Project each training image xi onto subspace spanned by

    principal components:(wi1,,wik) = (u1

    T(xi), , ukT(xi))

    Given novel image x Project onto subspace:

    (w1,,wk) = (u1T(x), , uk

    T(x))

    Optional: check reconstruction error xx to determinewhether image is really a face

    Classify as closest training face in k-dimensionalsubspace

    ^

    M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991

    PCA

    http://www.cs.ucsb.edu/~mturk/Papers/mturk-CVPR91.pdfhttp://www.cs.ucsb.edu/~mturk/Papers/mturk-CVPR91.pdf
  • 8/2/2019 Lecture15 - Face Recognition2

    48/101

    PCA

    General dimensionality reduction technique

    Preserves most of variance with a much morecompact representation Lower storage requirements (eigenvectors + a few

    numbers per face)

    Faster matching

    What are the problems for face recognition?

    Limitations

  • 8/2/2019 Lecture15 - Face Recognition2

    49/101

    Limitations

    Global appearance method:

    not robust at all to misalignment

    Not very robust to background variation, scale

    Principal Component Analysis (PCA)

  • 8/2/2019 Lecture15 - Face Recognition2

    50/101

    Principal Component Analysis (PCA)

    Problems

    Background (de-emphasize the outside of the face e.g., bymultiplying the input image by a 2D Gaussian window centered onthe face)

    Lighting conditions (performance degrades with light changes)

    Scale (performance decreases quickly with changes to head size)

    multi-scale eigenspaces

    scale input image to multiple sizes

    Orientation (performance decreases but not as fast as with scalechanges)

    plane rotations can be handled

    out-of-plane rotations are more difficult to handle

  • 8/2/2019 Lecture15 - Face Recognition2

    51/101

    Fisherfaces

    Limitations

  • 8/2/2019 Lecture15 - Face Recognition2

    52/101

    Limitations

    The direction of maximum variance is not

    always good for classification

    A more discriminative subspace: FLD

  • 8/2/2019 Lecture15 - Face Recognition2

    53/101

    A more discriminative subspace: FLD

    Fisher Linear Discriminants Fisher Faces

    PCA preserves maximum variance

    FLD preserves discrimination Find projection that maximizes scatter between

    classes and minimizes scatter within classes

    Reference: Eigenfaces vs. Fisherfaces, Belheumer et al., PAMI 1997

    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.3247&rep=rep1&type=pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.3247&rep=rep1&type=pdf
  • 8/2/2019 Lecture15 - Face Recognition2

    54/101

    Illustration of the Projection

    Poor Projection

    x1

    x2

    x1

    x2

    Using two classes as example:

    Good

  • 8/2/2019 Lecture15 - Face Recognition2

    55/101

    Comparing with PCA

  • 8/2/2019 Lecture15 - Face Recognition2

    56/101

    Variables

    N Sample images:

    c classes:

    Average of each class:

    Average of all data:

    Nxx ,,1 c,,1

    ikx

    k

    i

    i xN

    m 1

    N

    kkxN 1

    1m

  • 8/2/2019 Lecture15 - Face Recognition2

    57/101

    Scatter Matrices

    Scatter of class i: ) )T

    ik

    x

    iki xxSik

    mm

    c

    i

    iW SS1

    ) )

    c

    i

    T

    iiiB NS1

    mmmm

    Within class scatter:

    Between class scatter:

  • 8/2/2019 Lecture15 - Face Recognition2

    58/101

    Illustration

    2S

    1S

    B

    S

    21 SSSW

    x1

    x2

    Within class scatter

    Between class scatter

    Mathematical Form lation

  • 8/2/2019 Lecture15 - Face Recognition2

    59/101

    Mathematical Formulation

    After projection

    Between class scatter

    Within class scatter

    Objective

    Solution: Generalized Eigenvectors

    Rank of Wopt is limited

    Rank(SB)

  • 8/2/2019 Lecture15 - Face Recognition2

    60/101

    Illustration

    2S

    1S

    B

    S

    21 SSSW

    x1

    x2

    Recognition with FLD

  • 8/2/2019 Lecture15 - Face Recognition2

    61/101

    Recognition with FLD

    Compute within-class and between-class scatter matrices

    Solve generalized eigenvector problem

    Project to FLD subspace and classify by nearest neighbor

    WSW

    WSWW

    W

    T

    B

    T

    optW

    maxarg miwSwS iWiiB ,,1 l

    ) )Tikx

    iki xxSik

    mm

    c

    i

    iW SS1

    ) )

    c

    i

    T

    iiiB NS1

    mmmm

    xWxT

    opt

  • 8/2/2019 Lecture15 - Face Recognition2

    62/101

    Results: Eigenface vs. Fisherface

    Variation in Facial Expression, Eyewear, and Lighting

    Input: 160 images of 16 people Train: 159 images

    Test: 1 image

    Withglasses

    Withoutglasses

    3 Lightingconditions

    5 expressions

    Reference: Eigenfaces vs. Fisherfaces, Belheumer et al., PAMI 1997

    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.3247&rep=rep1&type=pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.3247&rep=rep1&type=pdf
  • 8/2/2019 Lecture15 - Face Recognition2

    63/101

    Eigenfaces vs. Fisherfaces

    Reference: Eigenfaces vs. Fisherfaces, Belheumer et al., PAMI 1997

    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.3247&rep=rep1&type=pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.3247&rep=rep1&type=pdf
  • 8/2/2019 Lecture15 - Face Recognition2

    64/101

    1 Identify a Specific Instance

  • 8/2/2019 Lecture15 - Face Recognition2

    65/101

    1. Identify a Specific Instance

    Frontal faces

    Typical scenario: few examples per face,identify or verify test example

    Whats hard: changes in expression, lighting,

    age, occlusion, viewpoint Basic approaches (all nearest neighbor)

    1. Project into a new subspace (or kernel space)

    (e.g., Eigenfaces=PCA)

    1. Measure face features2. Make 3d face model, compare shape+appearance (e.g.,

    AAM)

    Large scale comparison of methods

  • 8/2/2019 Lecture15 - Face Recognition2

    66/101

    Large scale comparison of methods

    FRVT 2006 Report

    Not much (or any) information available aboutmethods, but gives idea of what is doable

    FVRT Challenge

    http://www.frvt.org/FRVT2006/docs/FRVT2006andICE2006LargeScaleReport.pdfhttp://www.frvt.org/FRVT2006/docs/FRVT2006andICE2006LargeScaleReport.pdf
  • 8/2/2019 Lecture15 - Face Recognition2

    67/101

    FVRT Challenge

    Frontal faces

    FVRT2006 evaluation

    FalseRejectionRate at FalseAcceptanceRate = 0.001

    FVRT Challenge

  • 8/2/2019 Lecture15 - Face Recognition2

    68/101

    FVRT Challenge

    Frontal faces

    FVRT2006 evaluation: controlled illumination

    FVRT Challenge

  • 8/2/2019 Lecture15 - Face Recognition2

    69/101

    FVRT Challenge

    Frontal faces

    FVRT2006 evaluation: uncontrolled illumination

    FVRT Challenge

  • 8/2/2019 Lecture15 - Face Recognition2

    70/101

    FVRT Challenge

    Frontal faces

    FVRT2006 evaluation: computers win!

    Face recognition by humans

  • 8/2/2019 Lecture15 - Face Recognition2

    71/101

    Face recognition by humans

    Face recognition by humans: 20 results (2005)

    Slides by Jianchao Yang

    http://web.mit.edu/bcs/sinha/papers/20Results_2005.pdfhttp://www.cs.uiuc.edu/homes/dhoiem/courses/cs598_spring09/slides/spring09_jianchao_biologicalInspiredComputerVision.pdfhttp://www.cs.uiuc.edu/homes/dhoiem/courses/cs598_spring09/slides/spring09_jianchao_biologicalInspiredComputerVision.pdfhttp://web.mit.edu/bcs/sinha/papers/20Results_2005.pdfhttp://web.mit.edu/bcs/sinha/papers/20Results_2005.pdfhttp://web.mit.edu/bcs/sinha/papers/20Results_2005.pdf
  • 8/2/2019 Lecture15 - Face Recognition2

    72/101

  • 8/2/2019 Lecture15 - Face Recognition2

    73/101

  • 8/2/2019 Lecture15 - Face Recognition2

    74/101

  • 8/2/2019 Lecture15 - Face Recognition2

    75/101

  • 8/2/2019 Lecture15 - Face Recognition2

    76/101

  • 8/2/2019 Lecture15 - Face Recognition2

    77/101

  • 8/2/2019 Lecture15 - Face Recognition2

    78/101

  • 8/2/2019 Lecture15 - Face Recognition2

    79/101

    Result 17: Vision progresses fromi l t h li ti

  • 8/2/2019 Lecture15 - Face Recognition2

    80/101

    piecemeal to holistic

  • 8/2/2019 Lecture15 - Face Recognition2

    81/101

    Things to remember

  • 8/2/2019 Lecture15 - Face Recognition2

    82/101

    g

    PCA is a generally useful dimensionality reductiontechnique But not ideal for discrimination

    FLD better for discrimination, though only idealunder Gaussian data assumptions

    Computer face recognition works very well under

    controlled environments still room for improvementin general conditions

  • 8/2/2019 Lecture15 - Face Recognition2

    83/101

    Normal Distribution and Covariance

    Matrix

  • 8/2/2019 Lecture15 - Face Recognition2

    84/101

    Assume we have a d-dimensional input (e.g. 2D), x.

    We will see how can we characterize p(x), assuming it isnormally distributed.

    For 1-dimension it was the mean (m) and variance (s2)

    Mean=E[x] Variance=E[(x - m)2]

    For d-dimensions, we need

    the d-dimensional mean vector

    dxd dimensional covariance matrix

    If x ~ Nd(m,) then each dimension of x is univariate normal Converse is not true

    Normal Distribution &Multivariate Normal Distribution

  • 8/2/2019 Lecture15 - Face Recognition2

    85/101

    Multivariate Normal Distribution

    For a single variable, the normal density function is:

    For variables in higher dimensions, this generalizes to:

    where the mean mis now a d-dimensional

    vector, is a dx d covariance matrix

    || is the determinant of :

    Multivariate Parameters: Mean, Covariance

  • 8/2/2019 Lecture15 - Face Recognition2

    86/101

    ) TTT

    ddd

    d

    d

    EE xxxxx

    ][]))([(Cov

    2

    21

    2

    2

    221

    112

    2

    1

    sss

    sss

    sss

    ] ]TdE mm,...,:Mean 1 x

    )])([(,Cov jjiijiij xxExx mms

  • 8/2/2019 Lecture15 - Face Recognition2

    87/101

    Variance: How much X variesaround the expected value

    Covariance is the measure thestrength of the linearrelationshipbetween two random variables

    covariance becomes more positivefor each pair of values which differfrom their mean in the samedirection

    covariance becomes more negativewith each pair of values which differfrom their mean in oppositedirections.

    if two variables are independent,then their covariance/correlation

    is zero (converse is not true).

    )ji

    ij

    ijji

    jjiijiij

    xx

    xxExx

    ss

    s

    mms

    ,Corr:nCorrelatio

    )])([(,Cov:Covariance

    Correlation is a dimensionlessmeasure of linear dependence.

    range between1 and +1

    Covariance Matrices

  • 8/2/2019 Lecture15 - Face Recognition2

    88/101

  • 8/2/2019 Lecture15 - Face Recognition2

    89/101

    Contours of constant probability density for a 2D Gaussian distributions witha) general covariance matrix

    b) diagonal covariance matrix (covariance of x1,x2 is 0)c) proportional to the identity matrix (covariances are 0, variances of eachdimension are the same)

    Shape and orientation ofthe hyper-ellipsoid

  • 8/2/2019 Lecture15 - Face Recognition2

    90/101

    the hyper ellipsoidcentered at m is definedby

    v1v2

  • 8/2/2019 Lecture15 - Face Recognition2

    91/101

    The projection of a d-dimensional normal distribution

    onto the vector w is univariate normalwTx ~ N(w

    Tm, wTw) More generally, when W is a dxk matrix with k < d,

    then the k-dimensional WTx is k-variate normal with:

    WTx ~ Nk(WTm, WTW)

    W1T

    W2T

    .

    .

    WT=

  • 8/2/2019 Lecture15 - Face Recognition2

    92/101

    Eigenvalue Decomposition of the

    Covariance Matrix

    Eigenvector and Eigenvalue

  • 8/2/2019 Lecture15 - Face Recognition2

    93/101

    Given a linear transformation A, a non-zero vector x isdefined to be an eigenvector of the transformation if it

    satisfies the eigenvalue equation

    Ax = x for some scalar.

    In this situation, the scalar is called an eigenvalue of Acorresponding to the eigenvectorx.

    (A- I)x=0 => det(A- I) must be 0. Gives the characteristic polynomial whore roots are the

    eigenvalues of A

    Eigenvalues of the covariance matrix

    http://en.wikipedia.org/wiki/Scalar_(mathematics)http://en.wikipedia.org/wiki/Scalar_(mathematics)
  • 8/2/2019 Lecture15 - Face Recognition2

    94/101

    The covariance matrix is symmetrical and it can always be

    diagonalized as:

    T

    whereis the column matrix consisting of the eigenvectors of,

    T=-1andis the diagonal matrix whose elements are the eigenvalues of.

    ],...,,[ 21 l

  • 8/2/2019 Lecture15 - Face Recognition2

    95/101

  • 8/2/2019 Lecture15 - Face Recognition2

    96/101

    If x ~ Nd(m,) then each dimension of x is univariatenormal Converse is not true counterexample?

  • 8/2/2019 Lecture15 - Face Recognition2

    97/101

    The projection of a d-dimensional normal distribution

    onto the vector w is univariate normalwTx ~ N(w

    Tm, wTw) More generally, when W is a dxk matrix with k < d,

    then the k-dimensional WTx is k-variate normal with:

    WTx ~ Nk(WTm, WTW)

    W1T

    W2T

    .

    .

    WT=

    Univariate normal transformed x

  • 8/2/2019 Lecture15 - Face Recognition2

    98/101

    Proof for the first one:

    E[wTx] = wT E[x] = wT m Var[wTx] = E[ (wTx - wT m) (wTx - wT m)T]

    E[ (wTx - wT m)2] since(wTx - wT m) is a scalar E[ (wTx - wT m) (wTx - wT m) ] E wTxm) xm)Tw ] based on (slide 9.,rule 5) wT w move out and use defin. of

  • 8/2/2019 Lecture15 - Face Recognition2

    99/101

    Similar to projection onto

    w, ATxis computedhere, and results in ATmand ATA

    Whitening Transform

  • 8/2/2019 Lecture15 - Face Recognition2

    100/101

    Given the previous, we can construct a transformation matrix

    Aw todecorrelate variables and obtain =I. I.e. Compute Aw (x-m) to obtain zero-mean and unit variance data.

    This matrix is called the whitening transform Aw=1/2T where

    is the diagonal matrix of the eigenvalues of the original distributionand

    is the matrix composed of eigenvectors as its columns

    See that the resulting covariance matrix, AwAwT

    is indeed

    the identity matrix, I (has decorrelated dimensions)

  • 8/2/2019 Lecture15 - Face Recognition2

    101/101

    From previous slides:

    Awx has covariance matrix AwAwT

    AwAwT = (1/2T)( T) (1/2T)T

    = 1/2T) T) (1/2)T

    = 1/2I) I) (1/2)T since T 1

    = 1/2I) I) (1/2) since 1/2 is symmetric

    = 1/2 1 1/2

    I