CVPR09 Introduction

Confluence of Visual Computing & Sparse Representation

Yi Ma

Electrical and Computer Engineering, UIUC

&

Visual Computing Group, MSRA

CVPR, June 19th, 2009

CONTEXT - Massive High-Dimensional Data

Recognition Surveillance Search and Ranking Bioinformatics

The blessing of dimensionality:… real data highly concentrate on low-dimensional, sparse, or degenerate structures in the high-dimensional space.

The curse of dimensionality: …increasingly demand inference with limited samples for very high-dimensional data.

But nothing is free: Gross errors and irrelevant measurements are now ubiquitous in massive cheap data.

CONTEXT - New Phenomena with High-Dimensional Data

A sobering message: human intuition is severely limited in high-dimensional spaces:

Gaussian samples in 2D As dimension grows proportionally with the number of samples…

A new regime of geometry, statistics, and computation…

KEY CHALLENGE: efficiently and reliably recover sparse or degenerate structures from high-dimensional data, despite gross observation errors.

Analytical Tools:

• Powerful tools from high-dimensional geometry, measure

concentration, combinatorics, coding theory …

Computational Tools:

• Linear programming, convex optimization, greedy pursuit,

boosting, parallel processing …

Practical Applications:

• Compressive sensing, sketching, sampling, audio,

image, video, bioinformatics, classification, recognition …

Exciting confluence of

CONTEXT - High-dimensional Geometry, Statistics, Computation

PART I: Face recognition as sparse representation

Striking robustness to corruption

PART II: From sparse to dense error correction

How is such good face recognition performance possible?

PART III: A practical face recognition system

Alignment, illumination, scalability

PART IV: Extensions, other applications, and future directions

THIS TALK - Outline

Part I: Key Ideas and Application

Robust Face Recognition via Sparse Representation

CONTEXT – Face recognition: hopes and high-profile failures

# Pentagon Makes Rush Order for Anti-Terror Technology. Washington Post, Oct. 26, 2001.# Boston Airport to Test Face Recognition System. CNN.com, Oct. 26, 2001.# Facial Recognition Technology Approved at Va. Beach. 13News (wvec.com), Nov. 13, 2001.

# ACLU: Face-Recognition Systems Won't Work. ZDNet, Nov. 2, 2001.# ACLU Warns of Face Recognition Pitfalls. Newsbytes, Nov. 2, 2001.

# Identix, Visionics Double Up. CNN / Money Magazine, Feb. 22, 2002.

# 'Face testing' at Logan is found lacking. Boston Globe, July 17, 2002.# Reliability of face scan technology in dispute. Boston Globe, August 5, 2002.

# Tampa drops face-recognition system. CNET, August 21, 2003.# Airport anti-terror systems flub tests. USA Today, September 2, 2003.# Anti-terror face recognition system flunks tests. The Register, September 3, 2003.# Passport ID technology has high error rate. The Washington Post, August 6, 2004.# Smiling Germans ruin biometric passport system. VNUNet, November 10, 2005.

# U.K. cops look into face-recognition tech. ZDNet News, January 17, 2006.# Police build national mugshot database. Silicon.com, January 16, 2006.# Face Recognition Algorithms Surpass Humans matching faces, PAMI, 2007.# 100% Accuracy in Automatic Face Recognition, Science, 2008., January 25, 2008

and the drama goes on and on…

FORMULATION – Face recognition under varying illumination

Training ImagesFace Subspaces

Images of the same face under varying illumination lie approximately on a low (nine)-dimensional subspace, known as the harmonic plane [Basri & Jacobs, PAMI, 2003].

FORMULATION – Face recognition as sparse representation

Assumption: the test image, , , can be expressed as a linear combination of k training images, say of the same subject:

The solution, , , should be a sparse vector — of its entries should be zero, except for the ones associated with the correct subject.

ROBUST RECOGNITION – Occlusion + varying illumination

ROBUST RECOGNITION – Occlusion and Corruption

ROBUST RECOGNITION – Properties of the Occlusion

Several characteristics of occlusion :

Randomly supported errors (location is unknown and unpredictable)

Gross errors (arbitrarily large in magnitude)

Sparse errors? (concentrated on relatively small part(s) of the image)

ROBUST RECOGNITION – Problem Formulation

Problem: Find the correct (sparse) solution from the corrupted and over-determined system of linear equations:

Conventionally, the minimum 2-norm (least squares) solution is used:

ROBUST RECOGNITION – Joint Sparsity

Thus, we are looking for a sparse solution to an under-determined system of linear equations :

The problem can be solved efficiently via Linear Programming, and the solution is stable under moderate noise [Candes & Tao’04, Donoho’04].

The equivalence holds iff .

Wright, Yang, Ganesh, Sastry, and Ma. Robust Face Recognition via Sparse Representation, PAMI 2009

ROBUST RECOGNITION – Geometric Interpretation

Face recognition as determining which facet of the polytope the test image belongs to.


ROBUST RECOGNITION - L1 versus L2 Solution

Input:


ROBUST RECOGNITION – Classification from Coefficients

1 2 3 … N

subject isubject 1… subject n

1 2 3 … N

Classification criterion: assign to the class with the smallest residual.

subject i


ROBUST RECOGNITION – Algorithm Summary


EXPERIMENTS – Varying Level of Random Corruption

Extended Yale B Database (38 subjects)Testing: subset 3 (453 images)

Training: subsets 1 and 2 (717 images)

30% corruption

50%

70%

99.3%90.7%

37.5%


EXPERIMENTS – Varying Levels of Contiguous Occlusion

30% occlusion

98.5%

65.3%

90.3%

Extended Yale B Database (38 subjects)

Testing: subset 3 (453 images)

Training: subsets 1 and 2 (717 images), EBP ~ 13.3%.


EXPERIMENTS – Recognition with Face Parts Occluded

Results corroborate findings in human vision: the eyebrow or eye region is most informative for recognition [Sinha’06].

However, the difference is less significant for our algorithm than for humans.


EXPERIMENTS – Recognition with Disguises

The AR Database (100 subjects)Training: 799 images (un-occluded)

EBP = 11.6%.Testing: 200 images (with glasses)

200 images (with scarf)


Part II: Theory Inspired by Face Recognition

Dense Error Correction via L1 Minimization

Seek the sparsest solution:

Solution is not unique … but should be sparse: ideally, only supported on images of the same subject expected to be sparse: occlusion only affects a subset of the pixels

convex relaxation

PRIOR WORK - Face Recognition as Sparse Representation

Represent any test image wrt the entire training set as

coefficients corruption, occlusion

Training dictionaryTest image

99.3%90.7%

37.5%

Behavior under varying levels of random pixel corruption:

Can existing theory explain this phenomenon?

Recognition rate

PRIOR WORK - Striking Robustness to Random Corruption

• Apply parity check matrix s.t. , yielding

• Set

• Recover from clean system

PRIOR WORK - Error Correction by minimization

Underdetermined system in sparse e only

Candes and Tao [IT ‘05]:


• Set


Succeeds whenever in the reduced system .





• Set





This work:• Instead solve


Can be applied when A is wide (no parity check).


• Set





Succeeds whenever in the expanded system .

This work:• Instead solve


• (In)-coherence

Algebraic sufficient conditions:

suffices.

suffices.

PRIOR WORK - Equivalence in

Donoho + Elad ‘03

Candes + Tao + Romberg ‘06

• Restricted Isometry

Gribvonel + Nielsen ‘03

Candes + Tao ‘05

“The columns of should be uniformly well-spread”

Existing theory: should not succeed.

very sparse: # images per subject,

often nonnegative (illumination cone models).

as dense as possible: robust to highest possible corruption.

FACE IMAGES - Contrast with Existing Theory

Highly coherent

( volume )

Face images

Image space

Wright, and Ma. ICASSP 2009, submitted to IEEE Trans. Information Theory.

As dimension , an even more striking phenomenon emerges:

SIMULATION - Dense Error Correction?














Conjecture: If the matrices are sufficiently coherent, then for any error fraction , as , solving

corrects almost any error with .




DATA MODEL - Cross-and-Bouquet

Our model for should capture the fact that the columns are tightly clustered around a common mean :

We call this the “Cross-and-Bouquet’’ (CAB) model.

Mean is mostly incoherent with standard (error) basis

L^-norm of deviations well-controlled ( -> v )


Face images

Image space

ASYMPTOTIC SETTING - Weak Proportional Growth

• Observation dimension

• Problem size grows proportionally:

• Error support grows proportionally:

• Support size sublinear in :

Sublinear growth of is necessary to correct arbitrary fractions of errors:

Need at least “clean” equations.


“ recovers any sparse signal from almost any error with density less than 1”

Recall notation:

MAIN RESULT - Correction of Arbitrary Error Fractions


“L1 - [A I]”:

“L1 - comp”:

“ROMP”: Regularized orthogonal matching pursuit Needell + Vershynin ‘08

SIMULATION - Comparison to Alternative Approaches

Candes + Tao ‘05

Fraction of correct successes for increasing m ( , )

SIMULATION - Arbitrary Errors in WPG


For real face images, weak proportional growth corresponds to the setting where the total image resolution grows proportionally to the size of the database.

Fraction of correct recoveries Above: corrupted images.

( 50% probability of correct recovery )

Below: reconstruction.

IMPLICATIONS (1) - Error Correction with Real Faces


IMPLICATIONS (2) – Verification via Sparsity

Reject as invalid if

Valid Subject

Invalid Subject


Sparsity Concentration Index

IMPLICATIONS (2) – Receiver Operating Characteristic (ROC)

0%

50%

10%

30%20%

Yale Extended B, 19 valid subjects, 19 invalid, under different levels of occlusions:


Transmitter encodes message as .

Receiver observes corrupted version , recovers by linear programming.

Transmitter

Receiver

Extremely corrupting channel

IMPLICATIONS (3) - Communications through Bad Channels


Intentionally corrupts messages Knows , can recover by linear programming

Code breaking as a dictionary learning problem…

Alice Bob

Eavesdropper

?????????

IMPLICATIONS (4) - Application to Information Hiding


Part III: A Practical Automatic Face

Recognition System

FACE RECOGNITION – Toward a Robust, Real-World System

So far: surprisingly good laboratory results, strong theoretical foundations.

Remaining obstacles to truly practical automatic face recognition:

• Pose and misalignment - real face detector imprecision!

• Obtaining sufficient training - which illuminations are truly needed?

• Scalability to large databases - both in speed and accuracy.

All three difficulties can be addressed within the same unified framework of sparse representation.

FACE RECOGNITION – Coupled Problems of Pose and Illumination

Sufficient training illuminations, but no explicit alignment:

Alignment corrected, but insufficient training illuminations:




Robust alignment and training set selection:

Recognition succeeds

ROBUST POSE AND ALIGNMENT – Problem Formulation

What if the input image is misaligned, or has some pose?

If were known, still have a sparse representation

Seek the that gives the sparsest representation:

Wagner, Wright, Ganesh, Zhou and Ma. To appear in CVPR 09

POSE AND ALIGNMENT – Iterative Linear Programming

Linearize about current estimate of :

Nonconvex in

Linear program

Solve, set

Robust alignment as sparse representation:


POSE AND ALIGNMENT – How well does it work?

Succeeds up to >45o of pose::

Succeeds up to translations of 20% of face width, up to 30o in-plane rotation::

Recognition rate for synthetic misalignments (Multi-PIE)


POSE AND ALIGNMENT – L1 vs L2 solutions

Crucial role of sparsity in robust alignment:

Minimum -norm solution

Least-squaressolution


POSE AND ALIGNMENT – Algorithm details

Excellent classification, validation and robustness with a linear-time algorithm that is efficient in practice and highly parallelizable.

• First align to each subject separately

• Select k subjects with smallest , classify based on global sparse representation

Efficient multi-scale implementation


LARGE-SCALE EXPERIMENTS – Multi-PIE Database

Training: 249 subjects appearing in Session 1, 9 illuminations per subject.

Testing: 336 subjects appearing in Sessions 2,3,4. All 18 illuminations.

Examples of failures: Drastic changes in personal appearance over time


LARGE-SCALE EXPERIMENTS – Multi-PIE Database

Training: 249 subjects appearing in Session 1, 9 illuminations per subject.

Testing: 336 subjects appearing in Sessions 2,3,4. All 18 illuminations.

Validation performance:

Is the subject in the database of 249 people?

NN, NS, LDA not much better than chance.

Our method achieves an equal error rate of < 10%.

Receiver Operating Characteristic (ROC)





Robust alignment and training set selection:

Recognition succeeds

ACQUISITION SYSTEM – Efficient training collection

Generate different illuminations by reflecting light from DLP projectors off walls, onto subject:

Fast: hundreds of images in a matter of seconds, flexible and easy to assemble.


WHICH ILLUMINATIONS ARE NEEDED?

Real data representation error as a function of… …

Coverage of the sphere Granularity of the partition

Rear illuminations! 32 illumination cells

• Rear illuminations are critical for representing real world variability

Missing from standard data sets such as AR, PIE, MultiPIE!

• 30-40 distinct illumination patterns suffice


REAL-WORLD EXPERIMENTS – Our Dataset

Sufficient set of 38 training illuminations:

Subset 1

Subset 2

Subset 3

Subset 4

Subset 5

95.9% rec. rate

91.5% rec. rate

62.3% rec. rate

73.7% rec. rate

53.5% rec. rate

Recognition performance over 74 subjects:


Part IV: Extensions, Other Applications, and Future Directions

EXTENSIONS (1) – Topological Sparse Solutions

99.3% 90.7%

37.5%

Recognition rate

98.5%

65.3%

90.3%

EXTENSIONS (1) – Topological Sparse Solutions

How to better exploit the spatial characteristics of the error e in face recognition?

Longer-term direction: Sparse representation on structured domains (ala [Baraniuk ’08, Do ’07]):

Simple solution: Markov random field and L1 minimization.

recoverederror support

recoverederror

recoveredimage

Query image

Z. Zhou, A. Wagner, J. Wright, and Ma. Submitted to ICCV09.

60% occlusion

EXTENSIONS (2) – Does Feature Selection Matter?

12x10 pixels

120 dim

120 dim


Compressed sensing:

– Number of linear measurements is more important than specific details of how those measurements are taken.

– d > 2k log (N/d) random measurements suffice to efficiently reconstruct any k-sparse signal. [Donoho and Tanner ’07]



Extended Yale B: 38 subjects, 2,414 images of size 192x168Training: 1,207 random images, Testing: remaining 1,207 images



Enhance images by sparse representation in coupled dictionaries

(high- and low-resolution) of image patches:

OTHER APPLICATIONS (1) - Image Super-resolution

J. Yang, Wright, Huang, and Ma. CVPR 2008

MRF / BP[Freeman IJCV ‘00]

Our method

OriginalOriginalSoft edge prior[Dai ICCV ‘07]

OTHER APPLICATIONS (2) - Face Hallucination

J. Yang, H. Tangt, Huang, and Ma. ICIP 2008

OTHER APPLICATIONS (3) - Activity Detection & Recognition

A. Yang et. al. (at UC Berkeley). CVPR 2008

Precision: 98.8% and recall: 94.2%, far better than other existing detectors & classifiers.

OTHER APPLICATIONS (4) - Robust Motion Segmentation

S. Rao, R. Tron, R. Vidal, and Ma. CVPR 2008

deals with incomplete or mistracked features with dataset 80% corrupted!

OTHER APPLICATIONS (5) - Data Imputation in Speech

91% at SNR -5dB on AURORA-2 compared to 61% with conventional…

J.F. Gemmeke and G. Cranen, EUSIPCO’08

FUTURE WORK (1) – High-Dimensional Pattern Recognition

Toward an understanding of high-dimensional pattern classification…

Data tasks beyond error correction:

Excellent validation behavior based on sparsity of the solution

Understanding either behavior requires a much more expressive model for “what happens inside the bouquet?”

Excellent classification performanceeven with high-coherent dictionary

FUTURE WORK (2) – From Sparse Vectors to Low-Rank Matrices

Robust PCA Problem: given D, recover A.

convex relaxation

… ……

D - observation A – low-rank E – sparse error

Wright, Ganesh, Rao and Ma, submitted to the Journal of the ACM.

Nuclear norm

ROBUST PCA – Which matrices and which errors?

Random orthogonal model (of rank r) [Candes & Recht ‘08]:

independent samples from invariant measure on Steifel manifold of orthobases of rank r.

arbitrary.

Bernoulli error signs-and-support (with parameter ):

Magnitude of is arbitrary.


MAIN RESULT – Exact Solution of Robust PCA

“Convex optimization recovers almost any matrix of rank O(m/log m) from errors affecting O(m2) of the observations!”


ROBUST PCA – Contrast with literature

• [Chandrasekharan et. al. 2009]:

Correct recovery whp for

Only guarantees recovery from vanishing fractions of errors, even when r = O(1).

• This work:

Correct recovery whp for , even with

Key technique: Iterative surgery for producing a certifying dual vector (extends [Wright and Ma ’08]).


BONUS RESULT – Matrix completion in proportional growth

“Convex optimization exactly recovers matrices of rank O(m), even when O(m2) entries are missing!”


MATRIX COMPLETION – Contrast with literature

• [Candes and Tao 2009]:

Correct completion whp for

Empty for

• This work:

Correct completion whp for , even with

Exploits rich regularity and independence in random orthogonal model.

Caveats:

- [C-T ‘09] tighter for small r. - [C-T ‘09] generalizes better to other matrix ensembles.


FUTURE WORK (2) – Robust PCA via Iterative Thresholding

Efficient solutions to ?

Future direction: sampling approximations to the singular value thresholding operator [Rudelson and Vershynin ’08] ?

Semidefinite program in millions of unknowns!

repeat

Shrink singular values

Shrink absolute values

Provable (and efficient) convergence to global optimum.


Videos are highly coherent data. Errors correspond to pixels that cannot be well

interpolated by the previous video.

FUTURE WORK (2) - Video Coding and Anomaly Detection

Video Low-rank appx. Sparse error

Background variation

Anomalous activity


550 frames, 64 x 80 pixels,

significant illuminationvariation

FUTURE WORK (2) - Background modeling


Video Low-rank appx. Sparse errorStatic camera surveillance video

200 frames, 72 x 88 pixels,

Significant foregroundmotion

FUTURE WORK (2) - Face under different illuminations


Original images Low-rank appx. Sparse error

Ext. Yale B database, 29 images of one subject.

Images are 96 x 84 pixels.

Analytic and algorithmic tools from sparse representation lead to a new approach in face recognition:

• Robustness to corruption and occlusion

• Performance exceeds expectation & human ability

Face recognition reveals new phenomena in high-dim statistics & geometry:

• Dense error correction with a coherent dictionary

• Recovery of corrupt low-rank matrices

Theoretical insights to mathematical models lead back to practical gains

• Robust to misalignment, illumination, and occlusion

• Scalable in both computation and performance in realistic scenarios

CONCLUSIONS

MANY NEW APPLICATIONS BEYOND FACE RECOGNITION…

- Robust Face Recognition via Sparse Representation

IEEE Trans. on Pattern Analysis and Machine Intelligence, February 2009.- Dense Error Correction via L1-minimization

ICASSP 2008, Submitted to IEEE Trans. Information Theory, September 2008.- Towards a Practical Face Recognition System:

Robust Alignment and Illumination via Sparse Representation

IEEE Conference on Computer Vision and Pattern Recognition, June 2009.- Robust Principal Component Analysis:

Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization

Submitted to the Journal of the ACM, May 2009.

REFERENCES + ACKNOWLEDGEMENT

This work was funded by NSF, ONR, and MSR

John Wright, Allen Yang, Andrew Wagner, Arvind Ganesh, Zihan Zhou

Yi Ma – Confluence of Computer Vision and Sparse Representation

Questions, please?

THANK YOU



EXPERIMENTS – Design of Robust Training Sets

The Equivalence Breakdown Point

Extended Yale B

AR Database

Bounding EBP, submitted to ACC ‘09, Sharon, Wright, and Ma

FEATURE SELECTION – Extended Yale B Database

38 subjects, 2,414 images of size 192x168Training: 1,207 random images, Testing: remaining 1,207 images

Dimension (d) 30 56 120 504

Eigen [%] 80.0 89.6 94.0 97.0

Laplacian [%] 80.6 91.7 93.9 96.5

Random[%] 81.9 90.8 95.0 96.8

Downsample[%]

76.2 87.6 92.7 96.9

Fisher[%] 85.9 N/A N/A N/A

Dimension (d) 30 56 120 504

Eigen [%] 89.9 91.1 92.5 93.2

Laplacian [%] 89.0 90.4 91.9 93.4

Random[%] 87.4 91.5 93.9 94.1

Downsample[%]

80.8 88.2 91.1 93.4


Dimension (d) 30 56 120 504

Eigen [%] 72.0 79.8 83.9 85.8

Laplacian [%] 75.6 81.3 85.2 87.7

Random[%] 60.1 66.5 67.8 66.4

Downsample[%]

46.7 54.7 61.8 65.4


L1

Nearest Neighbor Nearest Subspace

FEATURE SELECTION – AR Database

100 subjects, 1,400 images of size 165x120Training: 700 images, varying lighting, expression Testing: 700 images from second session

FEATURE SELECTION – AR Database

100 subjects, 1,400 images of size 165x120Training: 700 images, varying lighting, expression Testing: 700 images from second session

Dimension (d) 30 56 120 504

Eigen [%] 71.1 80.0 85.7 92.0

Laplacian [%] 73.7 84.7 91.0 94.3

Random[%] 57.8 75.5 87.5 94.7

Downsample[%]

46.8 67.0 84.6 93.9

Fisher[%] 87.0 92.3 N/A N/A

Dimension (d) 30 56 120 504

Eigen [%] 64.1 77.1 82.0 85.1

Laplacian [%] 66.0 77.5 84.3 90.3

Random[%] 59.2 68.2 80.0 83.3

Downsample[%]

56.2 67.7 77.0 82.1

Fisher[%] 80.3 85.8 N/A N/A

Dimension (d) 30 56 120 504

Eigen [%] 68.1 74.8 79.3 80.5

Laplacian [%] 73.1 77.1 83.8 89.7

Random[%] 56.7 63.7 71.4 75.0

Downsample[%]

51.7 60.9 69.2 73.7

Fisher[%] 83.4 86.8 N/A N/A

L1

Nearest Neighbor Nearest Subspace

FEATURE SELECTION – Recognition with Face Parts

Feature Masks

Examples of Test Features

Features nose right eye mouch & chin

Dimension 4,270 5,050 12,936

L1 87.3% 93.7% 98.3%

NN 49.2% 68.8% 72.7%

NS 83.7% 78.6% 94.4%

SVM 70.8% 85.8% 95.3%

Whether is recovered depends only on

Call -recoverable if with these signs and support

and the minimizer is unique.

NOTATION - Correct Recovery of Solutions

Consider a fixed . W.l.o.g., let

Success iff

PROOF (1) - Problem Geometry

Restrict to and write

With some manipulation, optimality condition becomes

Consider a fixed . W.l.o.g., let

Success iff


Restrict to and write

With some manipulation, optimality condition becomes

hyperplane and the unit ball of

Introduce

The NSC


are disjoint.



are disjoint.

Introduce

The NSC



are disjoint.

Introduce

The NSC

Instead look for a hyperplane

separating and in the higher-dimensional space.


is a complicated polytope.

PROOF (2) - When Does the Iteration Succeed?

Consider the three statements:

Lemma: success if

Proof:

want to show



Lemma: success if

Proof:

TrivialUse that

Base case:

want to show


want to show


Lemma: success if

Proof:

Inductive step:



Lemma: success if

Proof:

want to show

Inductive step (cont’d):

Magnitude



Lemma: success if

Proof:

want to show

Inductive step (cont’d):

Documents

CVPR09 Introduction