43
Matrix Completion IT530 Lecture Notes

Matrix Completion

Embed Size (px)

Citation preview

Page 1: Matrix Completion

Matrix Completion

IT530 Lecture Notes

Page 2: Matrix Completion

Matrix Completion in Practice: Scenario 1

• Consider a survey of M people where each is asked Q questions.

• It may not be possible to ask each person all Q questions.

• Consider a matrix of size M by Q (each row is the set of questions asked to any given person).

• This matrix is only partially filled (many missing entries).

• Is it possible to infer the full matrix given just the recorded entries?

Page 3: Matrix Completion

Matrix Completion in Practice: Scenario 2

• Some online shopping sites such as Amazon, Flipkart, Ebay, Netflix etc. have recommender systems.

• These websites collect product ratings from users (especially Netflix).• Based on user ratings, these websites try to recommend other

products/movies to the user that he/she will like with a high probability.• Consider a matrix with the number of rows equal to the number of

users, and number of columns equal to the number of movies/products.• This matrix will be HIGHLY incomplete (no user has the patience to rate

too many movies!!) – maybe only 5% of the entries will be filled up.• Can the recommender system infer user preferences from just the

defined entries?

Page 4: Matrix Completion

Matrix Completion in Practice: Scenario 2

• Read about the Netflix Prize to design a better recommender system:http://en.wikipedia.org/wiki/Netflix_Prize

Page 5: Matrix Completion

Matrix Completion in Practice: Scenario 3

• Consider an image or a video with several pixel values missing.

• This is not uncommon in range imagery or remote sensing applications!

• Consider a matrix whose each column is a (vectorized) patch of M pixels. Let the number of columns be K.

• This M by K matrix will have many missing entries.• Is it possible to infer the complete matrix given just the

defined pixel values?• If the answer were yes, note the implications for image

compression!

Page 6: Matrix Completion

Matrix Completion in Practice: Scenario 4

• Consider a long video sequence of F frames.• Suppose I mark out M salient (interesting points) {Pi}, 1<=i<=M, in

the first frame.• And try to track those points in all subsequent frames. • Consider a matrix F of size M x 2F where row j contains the X and

Y coordinates of points on the motion trajectory of initial point Pj.• Unfortunately, many salient points may not be trackable due to

occlusion or errors from the tracking algorithms.• So F is highly incomplete.• Is it possible to infer the true matrix from only the available

measurements?

Page 7: Matrix Completion

A property of these matrices

• Scenario 1: Many people will tend to give very similar or identical answers to many survey questions.

• Scenario 2: Many people will have similar preferences for movies (only a few factors affect user choices).

• Scenario 3: Non-local self-similarity!• This makes the matrices in all these scenarios

low in rank!

Page 8: Matrix Completion

A property of these matrices

• Scenario 4: The true matrix underlying F in question has been PROVED to be of low rank (in fact, rank 3) under orthographic projection (ref: Tomasi and Kanade, “Shape and Motion from Image Streams Under Orthography: a Factorization Method”, IJCV 1992) and a few other more complex camera models up to rank 9 (ref: Irani, “Multiframe correspondence estimation using subspace constraints”, IJCV 2002).

• F (in the rank 3 case) can be expressed as a product of two matrices – a rotation matrix of size 2F x 3, and a shape matrix of size 3 x P.

• F is useful for many computer vision problems such as structure from motion, motion segmentation and multi-frame point correspondences.

Page 9: Matrix Completion

(Many) low-rank matrices are cool!

• The answer to the four questions/scenarios is a big NO in the general case.

• But it’s a big YES if we assume that the underlying matrix has low rank (and which, as we have seen, is indeed the case for all four scenarios) and obeys a few more constraints.

Page 10: Matrix Completion

Theorem 1 (Informal Statement)

• Consider an unknown matrix F of size n1 by n2 having rank r < min(n1, n2).

• Suppose we observe only a fraction of entries of F in the form of matrix G, where G(i,j) = F(i,j) for all (i,j) belonging to some uniformly randomly sampled set W and G(i,j) undefined elsewhere.

• If (1) F has row and column spaces that are “sufficiently incoherent” with the canonical basis (i.e. identity matrix), (2) r is “sufficiently small”, and (3) W is “sufficiently large”, then we can accurately recover F from G by solving the following rank minimization problem:

),(j)Γ(i,j)(i,Φ̂

subject to

)ˆ(min ˆ*

ji

rank

Ref: Candes and Recht, “Exact Matrix Completion via Convex Optimization”, 2008.

Page 11: Matrix Completion

Cool theorem, but …

• The afore-mentioned optimization problem is NP-hard (in fact, it is known to have double exponential complexity!)

Page 12: Matrix Completion

Theorem 2 (Informal Statement)

• Consider an unknown matrix F of size n1 by n2 having rank r < min(n1, n2).

• Suppose we observe only a fraction of entries of F in the form of matrix G, where G(i,j) = F(i,j) for all (i,j) belonging to some uniformly randomly sampled set W and G(i,j) undefined elsewhere.

• If (1) F has row and column spaces that are “sufficiently incoherent” with the canonical basis (i.e. identity matrix), (2) r is “sufficiently small”, and (3) W is “sufficiently large”, then we can accurately recover F from G by solving the following “trace-norm” minimization problem:

Ωj)(i,j)Γ(i,j)(i,Φ̂

subject to

ˆmin*

ˆ*

Page 13: Matrix Completion

What is the trace-norm of a matrix?

• The trace-norm of a matrix is the sum of its singular values.

• It is also called nuclear norm.• It is a softened version of the rank of a matrix, just like the

L1-norm of a vector is a softened version of the L0-norm of the vector.

• Minimization of the trace-norm (even under the given constraints) is a convex optimization problem and can be solved efficiently (no local minima issues).

• This is similar to the L1-norm optimization (in compressive sensing) being efficiently solvable.

Page 14: Matrix Completion

More about trace-norm minimization

• The efficient trace-norm minimization procedure is provably known to give the EXACT SAME result as the NP-hard rank minimization problem (under the same constraints and same conditions on the unknown matrix F and the sampling set W).

• This is analogous to the case where L1-norm optimization yielded the same result as L0-norm optimization (under the same set of constraints and conditions).

• Henceforth we will concentrate only on Theorem 2 (and beyond).

Page 15: Matrix Completion

The devil is in the details

• Beware: Not all low-rank matrices can be recovered from partial measurements!

• Example consider a matrix containing zeroes everywhere except the top-right corner.

• This matrix is low rank, but it cannot be recovered from knowledge of only a fraction of its entries!

• Many other such examples exist.• In reality, Theorems 1 and 2 work for low-rank matrices

whose singular vectors are sufficiently spread out, i.e. sufficiently incoherent with the canonical basis (i.e. with the identity matrix).

Page 16: Matrix Completion

Coherence of a basis

• The coherence of subspace U of Rn and having dimension r with respect to the canonical basis {ei} is defined as:

2

1max)( ini Uer

nU

Page 17: Matrix Completion

Formal definition of key assumptions

• Consider an underlying matrix M of size n1 by n2. Let the SVD of M be given as follows:

• We make the following assumptions about M:1. (A0) 2. (A1) The maximum entry in the n1 by n2

matrix is upper bounded by

Tkk

r

kk vuM

1

00 ))(),(max( that such VU

r

k

Tkkvu

1

0,)/( 1211 nnr

Page 18: Matrix Completion

What do these assumptions mean (in English)?

• (A0) means that the singular vectors of the matrix are sufficiently incoherent with the canonical basis.

• (A1) means that the singular vectors of the matrix are not spiky (e.g. canonical basis vectors are spiky signals – the spike has magnitude 1 and the rest of the signal is 0; a vector of n elements with all values equal to 1/square-root(n) is not spiky).

Page 19: Matrix Completion

Theorem 2 (Formal Statement)

the trace-norm minimizer (in the informal statement of theorem 2)

Page 20: Matrix Completion

Comments on Theorem 2

• Theorem 2 states that more entries of M must be known (denoted by m) for accurate reconstruction if (1) M has larger rank r, (2) greater value of m0 in (A0), (3) greater value of m1 in (A1).

• Example: If m0 = O(1) and the rank r is small, the reconstruction is accurate with high probability provided . )log(2.1 nrCnm

Page 21: Matrix Completion

Comments on Theorem 2

• It turns out that if the singular vectors of matrix M have bounded values, the condition (A1) almost always holds for the value m1 = O(log n).

Page 22: Matrix Completion

Matrix Completion under noise

• Consider an unknown matrix F of size n1 by n2 having rank r < min(n1, n2).

• Suppose we observe only a fraction of entries of F in the form of matrix G, where G(i,j) = F(i,j) + Z(i,j) for all (i,j) belonging to some set W and G(i,j) undefined elsewhere.

• Here Z refers to a white noise process which obeys the constraint that:

0,),(

2

ji

ijZ

Page 23: Matrix Completion

Matrix Completion under noise

• In such cases, the unknown matrix F can be recovered by solving the following minimization procedure (called as a semi-definite program):

2

j)(i,

*

)j)Γ(i,j)(i,Φ̂(

subject to

ˆmin

Page 24: Matrix Completion

Theorem 3 (informal statement)

• The reconstruction result from the previous procedure is accurate with an error bound given by:

2121

212*

entries known of fraction where

,2),min()2(

4

nnnn

mp

p

nnpF

Page 25: Matrix Completion

A Minimization Algorithm

• Consider the minimization problem:

• There are many techniques to solve this problem (http://perception.csl.illinois.edu/matrix-rank/sample_code.html)

• Out of these, we will study one method called “singular value thresholding”.

2

j)(i,

*

)j)Γ(i,j)(i,Φ̂(

subject to

ˆmin

Page 26: Matrix Completion

Singular Value Thresholding (SVT)

Ref: Cai et al, “A singular value thresholding algorithm for matrix completion”, SIAM Journal on Optimization, 2010.

}

;ΦΦ

}

;1);(

);(

{

met)not criterion ergencewhile(conv

1k

0

{

)0,(

(k)*

(k))1()(

)1((k)

)0(

*

21

kkPYY

Ythresholdsoft

RY

SVT

kkk

k

nn

}

);),(,0max(),( ):1(for

)svd using(

{

);(

2

21

kkSkkSnk

USVY

RYthresholdsoft

T

nn

*

2

2

1minarg

);(

XYX

Ythresholdsoft

FX

The soft-thresholding procedure obeys the following property (which we state w/o proof).

0),( else,1),(,),( jiPjiPji

Page 27: Matrix Completion

Properties of SVT (stated w/o proof)

• The sequence {F(k)} converges to the true solution of the main problem provided the step-sizes {dk} all lie between 0 and 2, and the value of t is large.

Page 28: Matrix Completion

Results

• The SVT algorithm works very efficiently and is easily implementable in MATLAB.

• The authors report reconstruction of a 30,000 by 30,000 matrix in just 17 minutes on a 1.86 GHz dual-core desktop with 3 GB RAM and with MATLAB’s multithreading option enabled.

Page 29: Matrix Completion

Results (Data without noise)

Page 30: Matrix Completion

Results (Noisy Data)

Page 31: Matrix Completion

Results on real data

• Dataset consists of a matrix M of geodesic distances between 312 cities in the USA/Canada.

• This matrix is of approximately low-rank (in fact, the relative Frobenius error between M and its rank-3 approximation is 0.1159).

• 70% of the entries of this matrix (chosen uniformly at random) were blanked out.

Page 32: Matrix Completion

Results on real data

• The underlying matrix was estimated using SVT.

• In just a few seconds and a few iterations, the SVT produces an estimate that is as accurate as the best rank-3 approximation of M.

Page 33: Matrix Completion

Results on real data

Page 34: Matrix Completion

Applications to Video Denoising

• Consider a video I(x,y,t) corrupted by Gaussian and impulse noise.

• Impulse noise usually has high magnitude and can be spatially sparse.

• Impulse noise can be “removed” by local median filtering, but this can also attenuate edges and corners.

• Instead, the median filtering is used as an intermediate step for collecting together K patches that are “similar” to a reference patch.

Page 35: Matrix Completion

Applications to Video Denoising

• The similar patches are assembled in the form of a matrix Q of size n x K (where n = number of pixels in each patch).

• If the noise were only Gaussian, we could perform an SVD on Q, attenuate appropriate singular values and then reconstruct the patches. In fact, this is what non-local PCA does (with minor differences).

Page 36: Matrix Completion

Applications to Video Denoising

• But the entries in Q are also corrupted by impulse noise, and this adversely affects the SVD computation.

• Hence we can regard those pixels that are affected by impulse noise as “incomplete entries” (how will you identify them?) and pose this as a matrix completion under noise problem.

Page 37: Matrix Completion

Applications to Video Denoising

• The formal problem statement is:

22

j)(i,

*

*

)j)Q(i,j)(P(i,

subject to

min

PP PMatrix of true/clean patches

Matrix of noisy patches

Set of indices of pixels in Q having well-defined values (i.e. values not corrupted by impulse noise).

Standard deviation of Gaussian noise

Page 38: Matrix Completion

Applications to Video Denoising

• We know this can be solved using singular value thresholding (though other algorithms also exist).

• This procedure is repeated throughout the image or the video.

Page 39: Matrix Completion

Result: effect of median filter

Page 40: Matrix Completion

Denoising results

Page 41: Matrix Completion

Denoising results

Page 42: Matrix Completion

Matrix Completion in Practice: Scenario 5

• Consider N points of dimension d each, denoted as {xi},1 ≤ i ≤ N.

• Consider a matrix D of size N x N, where Dij = |xi - xj|2 = -2xT

i xj + xTi xi + xT

i xj

• D can be written in the form D = 1zT+z1T-2XXT where z is a vector of length N where zi = xT

i xi, where X is a matrix of size N by d, and where 1 is a vector of length N containing all ones.

Page 43: Matrix Completion

Matrix Completion in Practice: Scenario 5

• Hence the rank of D = 2 + rank(XXT)=2+d.