38
Chapter 1 SVD, PCA & Pre- processing Part 1: Linear algebra and SVD

Chapter 1 SVD, PCA & Pre- processing · SVD, PCA & Pre-processing Part 1: Linear algebra and SVD. DMM, summer 2015 Pauli Miettinen ... • It gives us optimal solutions to least-squares

  • Upload
    others

  • View
    30

  • Download
    5

Embed Size (px)

Citation preview

  • Chapter 1 SVD, PCA & Pre-processing

    Part 1: Linear algebra and SVD

  • DMM, summer 2015 Pauli Miettinen

    Contents

    • Linear algebra crash course

    • The singular value decomposition

    • Normalization

    • Selecting the rank

    • The principal component analysis

    2

  • DMM, summer 2015 Pauli Miettinen

    Linear Algebra Crash Course

    3

  • DMM, summer 2015 Pauli Miettinen

    Matrices and vectors

    • A vector is

    • a 1D array of numbers

    • a geometric entity with magnitude and direction

    • a matrix with exactly one row or column

    4

    -1,6 -1,2 -0,8 -0,4 0 0,4 0,8 1,2 1,6 2 2,4 2,8 3,2

    -1,2

    -0,8

    -0,4

    0,4

    0,8

    1,2

    1,6

    2

    (1.2, 0.8)(2, –0.8)

  • DMM, summer 2015 Pauli Miettinen

    Norms and angles• The magnitude is measure

    by a (vector) norm

    • The Euclidean norm


    • General Lp norm 
(1 ≤ p ≤ ∞) 


    • The direction is measured by the angle

    5

    -1,6 -1,2 -0,8 -0,4 0 0,4 0,8 1,2 1,6 2 2,4 2,8 3,2

    -1,2

    -0,8

    -0,4

    0,4

    0,8

    1,2

    1,6

    2

    (1.2, 0.8)(2, –0.8)k�k = k�k2 =

    ÄPn�=1 �

    2ä1/2

    k�kp =ÄPn

    �=1 |�|pä1/p

    {(1.2

    2 + 0.8

    2 )1/2 = 1

    .442

    ∠0.5880 rad (33.69°)

  • DMM, summer 2015 Pauli Miettinen

    Basic vector operations• The transpose of x, xT, transposes a row

    vector into a column vector and vice versa

    • A dot product of two vectors of the same dimension is

    • A.k.a. scalar product or inner product

    • Same as ⟨x,y⟩, aTb (for column vectors), or abT (for row vectors)

    6

    � · y =Pn

    �=1 ��y�

  • DMM, summer 2015 Pauli Miettinen

    Orthogonality

    • Orthogonality is a generalization of perpendicularity

    • x and y are orthogonal if x · y = 0

    • in Euclidean space: x · y = ||x|| ||y|| cos θ

    • θ is the angle between x and y

    7

  • DMM, summer 2015 Pauli Miettinen

    Matrix algebra

    • Matrices in ℝn×n form a ring

    • Addition, subtraction, and multiplication

    • But usually no division

    • Multiplication is not commutative

    • AB ≠ BA in general

    8

  • DMM, summer 2015 Pauli Miettinen

    Matrix multiplication• The product of two matrices, A and B, is

    defined element-wise as 



    • The number of columns in A and number of rows in B must agree

    • inner dimension

    9

    (AB)�j =kX

    �=1���b�j

  • DMM, summer 2015 Pauli Miettinen

    Intuition for Matrix Multiplication

    • Element (AB)ij is the inner product of row i of A and column j of B

    10

    C�j =Pk

    �=1 ���b�j

  • DMM, summer 2015 Pauli Miettinen

    Intuition for Matrix Multiplication

    • Column j of AB is the linear combination of columns of A with the coefficients coming from column j of B

    11

    C =ïîPk

    �=1 b�1��ó îPk

    �=1 b�2��ó· · ·îPk

    �=1 b�m��óò

  • DMM, summer 2015 Pauli Miettinen

    Intuition for Matrix Multiplication

    • Matrix AB is a sum of k matrices alblT obtained by multiplying the l-th column of A with the l-th row of B

    12

    C =Pk

    �=1 ��bT�

  • DMM, summer 2015 Pauli Miettinen

    Matrix decompositions• A decomposition of matrix A expresses it as

    a product of two (or more) factor matrices

    • A = BC

    • Every matrix has decomposition A = AI (or 
A = IA if n < m)

    • The size of the decomposition is the inner dimension of the product

    13

  • DMM, summer 2015 Pauli Miettinen

    Matrices as linear maps• Matrix A ∈ ℝn×m is a linear mapping from ℝm to ℝn

    • A(x) = Ax

    • If A ∈ ℝn×k and B ∈ ℝk×m, then AB is a mapping from ℝm to ℝn

    • The transpose AT is a mapping from ℝn to ℝm

    • (AT)ij = Aji

    • (AB)T = BTAT

    14

  • DMM, summer 2015 Pauli Miettinen

    Matrix inverse• Square matrix A is invertible if there is a matrix

    B s.t. AB = BA = I

    • B is the inverse of A, denoted A–1

    • Usually the transpose is not the inverse

    • Non-square matrices don’t have general inverses

    • Can have left or right inverse: 
AR = I or LA = I

    15

  • DMM, summer 2015 Pauli Miettinen

    Linear independency• Vector u is linearly dependent on a set of

    vectors V = {vi} if u is a linear combination of vi

    • u = ∑i aivi for some ai

    • If u is not linearly dependent, it is linearly independent

    • Set V of vectors is linearly independent if all vi are linearly independent of V \ {vi}

    16

  • DMM, summer 2015 Pauli Miettinen

    Matrix ranks• The column rank of a matrix A is the number of

    linearly independent columns of A

    • The row rank of A is the number of linearly independent rows of A

    • The Schein rank of A is the least integer k such that A can be expressed as a sum of k rank-1 matrices

    • Rank-1 matrix is an outer product of two vectors

    17

  • DMM, summer 2015 Pauli Miettinen

    Orthogonal matrices• Set of vectors {vi} is orthogonal if all vi are mutually

    orthogonal, i.e. ⟨vi, vj⟩ = 0 for all i ≠ j

    • If ||vi||2 = 1 for all vi, the set is orthonormal

    • Square matrix A is orthogonal if its columns form a set of orthonormal vectors

    • Non-square matrices can be row- or column-orthogonal

    • If A is orthogonal, then A–1 = AT

    18

  • DMM, summer 2015 Pauli Miettinen

    Properties of orthogonal matrices

    • The inverse of orthogonal matrices is easy to compute

    • Orthogonal matrices perform a rotation

    • Only the angle of the vector is changed, the length stays the same

    19

  • DMM, summer 2015 Pauli Miettinen

    Matrix norms• Matrix norms measure the magnitude of the

    matrix

    • the magnitude of the values or the image

    • Operator norms: 
||A||p = max{||Mx||p : ||x||p = 1} for p ≥ 1

    • Frobenius norm:


    20

    kAkF =ÄPn

    �=1

    Pmj=1 �

    2�j

    ä1/2

  • DMM, summer 2015 Pauli Miettinen

    Singular Value Decomposition

    21

    Skillicorn Chapter 3; Golub & Van Loan Chapters 2.4–2.6, Leskovec et al. Chapter 11.3

  • – Diane O’Leary, 2006

    “The SVD is the Swiss Army knife of matrix decompositions”

  • DMM, summer 2015 Pauli Miettinen

    The definition• Theorem. For every A ∈ ℝn×m there exists an

    n-by-n orthogonal matrix U and an m-by-m orthogonal matrix V such that UTAV is an 
n-by-m diagonal matrix Σ that has values 
σ1 ≥ σ2 ≥ … ≥ σmin{n,m} ≥ 0 in its diagonal

    • I.e. every A has decomposition A = UΣVT

    • The singular value decomposition of A

    23

  • DMM, summer 2015 Pauli Miettinen

    In picture

    24

    =A U VT⌅

    vi are the right singular vectors

    σi are the singular values

    ui are the left singular vectors

  • DMM, summer 2015 Pauli Miettinen

    Some useful equations

    • A = UΣVT = ∑i σiuiviT

    • Expresses A as a sum of rank-1 matrices

    • A–1 = (UΣVT)–1 = VΣ–1UT (if A is invertible)

    • ATAvi = σi2vi (for any A)

    • AATui = σi2ui (for any A)

    25

  • DMM, summer 2015 Pauli Miettinen

    Truncated SVD• The rank of the matrix is the number of its

    non-zero singular values (write A = ∑i σiuiviT)

    • The truncated SVD takes the first k columns of U and V and the main k-by-k submatrix of Σ

    • Ak = UkΣkVkT

    • Uk and Vk are column-orthogonal

    26

  • DMM, summer 2015 Pauli Miettinen

    Truncated SVD

    27

    ≈A U VT

    =A U VT⌅Full

    Truncated

  • DMM, summer 2015 Pauli Miettinen

    Why is SVD important?• It gives us the dimensions of the fundamental

    subspaces

    • It lets us compute various norms

    • It tells about sensitivity of linear systems

    • It gives us optimal solutions to least-squares linear systems

    • It gives us the least-error rank-k decomposition

    • Every matrix has one

    28

  • DMM, summer 2015 Pauli Miettinen

    Fundamental theorem of linear algebra

    • Theorem. Every n-by-m matrix A induces four fundamental subspaces

    • The range of dimension rank(A) = r

    • The set of all linear combinations of columns of A

    • The kernel of dimension m – r

    • The set of all vectors x for which Ax = 0

    • The coimage of dimension r

    • The cokernel of dimension n – r

    29

  • DMM, summer 2015 Pauli Miettinen

    Fundamental subspaces

    • The bases for the fundamental subspaces are:

    • Range: the first r columns of U

    • Kernel: the last (m – r) columns of V

    • Coimage: the first r columns of V

    • Cokernel: the last (n – r) columns of U

    30

  • DMM, summer 2015 Pauli Miettinen

    SVD and norms

    • Let A = UΣVT be the SVD of A.

    • Therefore

    • For truncated SVD,

    31

    kAk2F =Pmin{n,m}

    �=1 �2�

    kAk2 = �1

    kAk2 kAkF pmin{n,m} kAk2

    kAkk2F =Pk

    �=1 �2�

  • DMM, summer 2015 Pauli Miettinen

    Sensitivity of linear systems

    • The solution for system Ax = b is x = A–1b

    • Requires that A is invertible

    • Hence

    • Small changes in A or b yield large changes in x if σn is small

    • Can we characterize this sensitivity?

    32

    � =�U�VT��1b =Pn

    �=1�T� b��

    ��

  • DMM, summer 2015 Pauli Miettinen

    Condition number• The condition number κp(A) of a square matrix

    A is ||A||p ||A–1||p

    • Particularly κ2(A) = σ1(A)/σn(A)

    • κ2(A) = ∞ for singular A

    • If κ is large, the matrix is ill-conditioned

    • The solution is sensible for small perturbations

    33

  • DMM, summer 2015 Pauli Miettinen

    Least-squares linear systems

    • Problem. Given A ∈ ℝn×m and b ∈ℝn, find 
x ∈ ℝm minimizing ||Ax – b||2.

    • If A is invertible, x = A–1b is an exact solution

    • For non-invertible A we have to find other solution

    34

  • DMM, summer 2015 Pauli Miettinen

    The Moore–Penrose pseudo-inverse

    • n-by-m matrix B is the Moore–Penrose pseudo-inverse of n-by-m matrix A if

    • ABA = A (but possibly AB ≠ I)

    • BAB = B

    • (AB)T = AB (AB is symmetric)

    • (BA)T = BA

    • Pseudo-inverse of A is denoted by A+

    35

  • DMM, summer 2015 Pauli Miettinen

    Pseudo-inverse and SVD• If A = UΣVT is the SVD of A, then 


    A+ = VΣ–1UT

    • Σ–1 replaces non-zero σi’s with 1/σi and transposes the result

    • N.B. not a real inverse

    • Theorem. Setting x = A+y gives the optimal solution to ||Ax – y||

    36

  • DMM, summer 2015 Pauli Miettinen

    The Eckart–Young theorem

    • Theorem. Let Ak = UkΣkVkT be the rank-k truncated SVD of A. Then Ak is the closest rank-k matrix of A in the Frobenius sense, that is,
||A – Ak||F ≤ ||A – B||F for all rank-k matrices B

    • Holds for any unitarily invariant norm

    37

  • DMM, summer 2015 Pauli Miettinen

    That’s all for today

    • Next week: normalization and selecting the rank

    • Lecture starts at 12:00 sharp

    • Will end earlier as well

    • But SVD will return…

    38