Chapter 1 SVD, PCA & Pre- processing · SVD, PCA & Pre-processing Part 1: Linear algebra and SVD. DMM, summer 2015 Pauli Miettinen ... • It gives us optimal solutions to least-squares

Chapter 1 SVD, PCA & Pre-processing

Part 1: Linear algebra and SVD

DMM, summer 2015 Pauli Miettinen

Contents

• Linear algebra crash course

• The singular value decomposition

• Normalization

• Selecting the rank

• The principal component analysis

2


Linear Algebra Crash Course

3


Matrices and vectors

• A vector is

• a 1D array of numbers

• a geometric entity with magnitude and direction

• a matrix with exactly one row or column

4

-1,6 -1,2 -0,8 -0,4 0 0,4 0,8 1,2 1,6 2 2,4 2,8 3,2

-1,2

-0,8

-0,4

0,4

0,8

1,2

1,6

2

(1.2, 0.8)(2, –0.8)


Norms and angles• The magnitude is measure

by a (vector) norm

• The Euclidean norm 

• General Lp norm  (1 ≤ p ≤ ∞)  

• The direction is measured by the angle

5

-1,6 -1,2 -0,8 -0,4 0 0,4 0,8 1,2 1,6 2 2,4 2,8 3,2

-1,2

-0,8

-0,4

0,4

0,8

1,2

1,6

2

(1.2, 0.8)(2, –0.8)k�k = k�k2 =

ÄPn�=1 �

2ä1/2

k�kp =ÄPn

�=1 |�|pä1/p

{(1.2

2 + 0.8

2 )1/2 = 1

.442

∠0.5880 rad (33.69°)


Basic vector operations• The transpose of x, xT, transposes a row

vector into a column vector and vice versa

• A dot product of two vectors of the same dimension is

• A.k.a. scalar product or inner product

• Same as ⟨x,y⟩, aTb (for column vectors), or abT (for row vectors)

6

� · y =Pn

�=1 ��y�


Orthogonality

• Orthogonality is a generalization of perpendicularity

• x and y are orthogonal if x · y = 0

• in Euclidean space: x · y = ||x|| ||y|| cos θ

• θ is the angle between x and y

7


Matrix algebra

• Matrices in ℝn×n form a ring

• Addition, subtraction, and multiplication

• But usually no division

• Multiplication is not commutative

• AB ≠ BA in general

8


Matrix multiplication• The product of two matrices, A and B, is

defined element-wise as   

• The number of columns in A and number of rows in B must agree

• inner dimension

9

(AB)�j =kX

�=1��b�j


Intuition for Matrix Multiplication

• Element (AB)ĳ is the inner product of row i of A and column j of B

10

C�j =Pk

�=1 ��b�j



• Column j of AB is the linear combination of columns of A with the coefficients coming from column j of B

11

C =ïîPk

�=1 b�1��ó îPk

�=1 b�2��ó· · ·îPk

�=1 b�m��óò



• Matrix AB is a sum of k matrices alblT obtained by multiplying the l-th column of A with the l-th row of B

12

C =Pk

�=1 ��bT�


Matrix decompositions• A decomposition of matrix A expresses it as

a product of two (or more) factor matrices

• A = BC

• Every matrix has decomposition A = AI (or  A = IA if n < m)

• The size of the decomposition is the inner dimension of the product

13


Matrices as linear maps• Matrix A ∈ ℝn×m is a linear mapping from ℝm to ℝn

• A(x) = Ax

• If A ∈ ℝn×k and B ∈ ℝk×m, then AB is a mapping from ℝm to ℝn

• The transpose AT is a mapping from ℝn to ℝm

• (AT)ĳ = Aji

• (AB)T = BTAT

14


Matrix inverse• Square matrix A is invertible if there is a matrix

B s.t. AB = BA = I

• B is the inverse of A, denoted A–1

• Usually the transpose is not the inverse

• Non-square matrices don’t have general inverses

• Can have left or right inverse:  AR = I or LA = I

15


Linear independency• Vector u is linearly dependent on a set of

vectors V = {vi} if u is a linear combination of vi

• u = ∑i aivi for some ai

• If u is not linearly dependent, it is linearly independent

• Set V of vectors is linearly independent if all vi are linearly independent of V \ {vi}

16


Matrix ranks• The column rank of a matrix A is the number of

linearly independent columns of A

• The row rank of A is the number of linearly independent rows of A

• The Schein rank of A is the least integer k such that A can be expressed as a sum of k rank-1 matrices

• Rank-1 matrix is an outer product of two vectors

17


Orthogonal matrices• Set of vectors {vi} is orthogonal if all vi are mutually

orthogonal, i.e. ⟨vi, vj⟩ = 0 for all i ≠ j

• If ||vi||2 = 1 for all vi, the set is orthonormal

• Square matrix A is orthogonal if its columns form a set of orthonormal vectors

• Non-square matrices can be row- or column-orthogonal

• If A is orthogonal, then A–1 = AT

18


Properties of orthogonal matrices

• The inverse of orthogonal matrices is easy to compute

• Orthogonal matrices perform a rotation

• Only the angle of the vector is changed, the length stays the same

19


Matrix norms• Matrix norms measure the magnitude of the

matrix

• the magnitude of the values or the image

• Operator norms:  ||A||p = max{||Mx||p : ||x||p = 1} for p ≥ 1

• Frobenius norm: 

20

kAkF =ÄPn

�=1

Pmj=1 �

2�j

ä1/2


Singular Value Decomposition

21

Skillicorn Chapter 3; Golub & Van Loan Chapters 2.4–2.6, Leskovec et al. Chapter 11.3

– Diane O’Leary, 2006

“The SVD is the Swiss Army knife of matrix decompositions”


The definition• Theorem. For every A ∈ ℝn×m there exists an

n-by-n orthogonal matrix U and an m-by-m orthogonal matrix V such that UTAV is an  n-by-m diagonal matrix Σ that has values  σ1 ≥ σ2 ≥ … ≥ σmin{n,m} ≥ 0 in its diagonal

• I.e. every A has decomposition A = UΣVT

• The singular value decomposition of A

23


In picture

24

=A U VT⌅

vi are the right singular vectors

σi are the singular values

ui are the left singular vectors


Some useful equations

• A = UΣVT = ∑i σiuiviT

• Expresses A as a sum of rank-1 matrices

• A–1 = (UΣVT)–1 = VΣ–1UT (if A is invertible)

• ATAvi = σi2vi (for any A)

• AATui = σi2ui (for any A)

25


Truncated SVD• The rank of the matrix is the number of its

non-zero singular values (write A = ∑i σiuiviT)

• The truncated SVD takes the first k columns of U and V and the main k-by-k submatrix of Σ

• Ak = UkΣkVkT

• Uk and Vk are column-orthogonal

26


Truncated SVD

27

≈A U VT

⌅

=A U VT⌅Full

Truncated


Why is SVD important?• It gives us the dimensions of the fundamental

subspaces

• It lets us compute various norms

• It tells about sensitivity of linear systems

• It gives us optimal solutions to least-squares linear systems

• It gives us the least-error rank-k decomposition

• Every matrix has one

28


Fundamental theorem of linear algebra

• Theorem. Every n-by-m matrix A induces four fundamental subspaces

• The range of dimension rank(A) = r

• The set of all linear combinations of columns of A

• The kernel of dimension m – r

• The set of all vectors x for which Ax = 0

• The coimage of dimension r

• The cokernel of dimension n – r

29


Fundamental subspaces

• The bases for the fundamental subspaces are:

• Range: the first r columns of U

• Kernel: the last (m – r) columns of V

• Coimage: the first r columns of V

• Cokernel: the last (n – r) columns of U

30


SVD and norms

• Let A = UΣVT be the SVD of A.

•

•

• Therefore

• For truncated SVD,

31

kAk2F =Pmin{n,m}

�=1 �2�

kAk2 = �1

kAk2 kAkF pmin{n,m} kAk2

kAkk2F =Pk

�=1 �2�


Sensitivity of linear systems

• The solution for system Ax = b is x = A–1b

• Requires that A is invertible

• Hence

• Small changes in A or b yield large changes in x if σn is small

• Can we characterize this sensitivity?

32

� =�U�VT��1b =Pn

�=1�T� b��

��


Condition number• The condition number κp(A) of a square matrix

A is ||A||p ||A–1||p

• Particularly κ2(A) = σ1(A)/σn(A)

• κ2(A) = ∞ for singular A

• If κ is large, the matrix is ill-conditioned

• The solution is sensible for small perturbations

33


Least-squares linear systems

• Problem. Given A ∈ ℝn×m and b ∈ℝn, find  x ∈ ℝm minimizing ||Ax – b||2.

• If A is invertible, x = A–1b is an exact solution

• For non-invertible A we have to find other solution

34


The Moore–Penrose pseudo-inverse

• n-by-m matrix B is the Moore–Penrose pseudo-inverse of n-by-m matrix A if

• ABA = A (but possibly AB ≠ I)

• BAB = B

• (AB)T = AB (AB is symmetric)

• (BA)T = BA

• Pseudo-inverse of A is denoted by A+

35


Pseudo-inverse and SVD• If A = UΣVT is the SVD of A, then  

A+ = VΣ–1UT

• Σ–1 replaces non-zero σi’s with 1/σi and transposes the result

• N.B. not a real inverse

• Theorem. Setting x = A+y gives the optimal solution to ||Ax – y||

36


The Eckart–Young theorem

• Theorem. Let Ak = UkΣkVkT be the rank-k truncated SVD of A. Then Ak is the closest rank-k matrix of A in the Frobenius sense, that is, ||A – Ak||F ≤ ||A – B||F for all rank-k matrices B

• Holds for any unitarily invariant norm

37


That’s all for today

• Next week: normalization and selecting the rank

• Lecture starts at 12:00 sharp

• Will end earlier as well

• But SVD will return…

38

Documents

Chapter 1 SVD, PCA & Pre- processing · SVD, PCA & Pre-processing Part 1: Linear algebra and SVD. DMM, summer 2015 Pauli Miettinen ... • It gives us optimal solutions to least-squares