In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are

$Page 1: In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are$
John Wright

Electrical Engineering

Columbia University

In Search of Provably Effective Visual Data Representations

??

Images

Videos

Web data

> 1M pixels

> 1B voxels

> 10B+ websites

Datasets are massive, high-dimensional…

??

Images

Videos

Web data

> 1M pixels

> 1B voxels

> 10B+ websites

… intrinsic structures are low-dimensional

How can we exploit low-dimensional structure in high-dimensional data?

Recognition Inpainting Denoising

Compression Transmission Stabilization Repair

Indexing Ranking Search Collaborative filtering…

Images

Videos

Web data

Good solutions enable many applications

??

Real application data often contain missing observations, corruption or even malicious errors and noise. Classical algorithms (e.g., least squares, PCA) break down …

But … it is not easy…

Images

Videos

Web data

??

This talk

Images

Videos

Web data

??

How do we develop provably correct and efficient algorithms for recovering low-dimensional structure from corrupted high-dimensional observations?

Given with low-rank, sparse, recover .

Numerous approaches to robust PCA in the literature:

• Multivariate trimming [Gnanadeskian + Kettering ’72] • Random sampling [Fischler + Bolles ’81] • Alternating minimization [Ke + Kanade ’03] • Influence functions [de la Torre + Black ’03]

•

Can we give an efficient, provably correct algorithm?

… … …

Formulation – Robust PCA?

Why is the problem hard?

+ + or

Some very sparse matrices are also low-rank:

Certain sparse error patterns make recovering impossible:

+ =

Can we recover that are incoherent with the standard basis?

Can we correct whose support is not adversarial?

Singular vectors of not too spiky:

Uniform model on error support, signs and magnitudes arbitrary:

not too cross-correlated:

Can we recover that are incoherent with the standard basis from almost all errors ?

Incoherence condition on singular vectors, singular values arbitrary:

Incoherence condition: [Candès + Recht ‘08]

When is there hope? (In) coherence

Naïve optimization approach

Look for a low-rank that agrees with the data up to some sparse error :

… and how should we solve it?

Naïve optimization approach

Look for a low-rank that agrees with the data up to some sparse error :

Convex relaxation

Nuclear norm heuristic: [Fazel, Hindi, Boyd ‘01], [Recht, Fazel, Parillo ‘08]. Convex surrogate for LR+S: [Chandrasekharan, Sanghavi, Parillo, Wilsky ‘11].

… and how should we solve it?

“Convex optimization recovers matrices of rank from errors corrupting entries”

[Candès, Li, Ma, and W., ’09].

Theory – Correct recovery

Video = Low-rank appx. + Sparse error

Static camera surveillance video

200 frames, 144 x 172 pixels,

Significant foreground motion

…

…

RPCA

Applications – Background modeling from video

Real time (recursive) versions: Vaswani et. al., Balzano et. al. ‘12.

Peng, Ganesh, W., Ma. CVPR 2010

Application – Faces from the internet

Input: faces detected by a face detector ( )

Average


Application – Faces detected

Output: aligned faces ( )

Average


Application – Faces aligned

Output: clean, low-rank faces ( )

Average


Application – Faces repaired and cleaned

Gloria Macapagal Arroyo

Jennifer Capriati

Laura Bush

Serena Williams

Barack Obama

Ariel Sharon

Arnold Schwarzenegger

Colin Powell

Donald Rumsfeld

George W Bush

Gerhard Schroeder

Hugo Chavez

Jacques Chirac

Jean Chretien

John Ashcroft

Junichiro Koizumi

Lleyton Hewitt

Luiz Inacio Lula da Silva

Tony Blair

Vladimir Putin

Average face before alignment & repairing


Application – Celebrities from the internet

Gloria Macapagal Arroyo

Jennifer Capriati

Laura Bush

Serena Williams

Barack Obama

Ariel Sharon

Arnold Schwarzenegger

Colin Powell

Donald Rumsfeld

George W Bush

Gerhard Schroeder

Hugo Chavez

Jacques Chirac

Jean Chretien

John Ashcroft

Junichiro Koizumi

Lleyton Hewitt

Luiz Inacio Lula da Silva

Tony Blair

Vladimir Putin

Average face after alignment & repair


Application – Celebrities from the internet

BIG PICTURE – Parallelism of Sparsity and Low-Rank

Degeneracy of

Measure

Convex Surrogate

Compressed Sensing

Error Correction

Domain Transform

Mixed Structures

Sparse Vector

individual signal

L0 norm

L1 norm

Low-Rank Matrix

correlated signals

Nuclear norm

A SUITE OF POWERFUL REGULARIZERS

• [Zhou et. al. ‘09] Spatially contiguous sparse errors via MRF

• [Bach ’10] – structured relaxations from submodular functions

• [Negahban+Yu+Wainwright ’10] – geometric analysis of recovery

• [Becker+Candès+Grant ’10] – algorithmic templates

• [Xu+Caramanis+Sanghavi ‘11] column sparse errors L2,1 norm

• [Recht+Parillo+Chandrasekaran+Wilsky ’11] – compressive sensing of various structures

• [Candes+Recht ’11] – compressive sensing of decomposable structures

• [McCoy+Tropp’11] – decomposition of sparse and low-rank structures

• [Wright+Ganesh+Min+Ma, ISIT’12] – superposition of decomposable structures

For robust recovery of a family of low-dimensional structures:

Take home message: Let the data and application tell you the structure…

Can we provably detect object instances under varying lighting?

Conceptual Problem:

Given (information about) an object , can we provably detect or recognize the object from a new image ?

What information do we need?

What computational complexity?

What algorithmic performance guarantee?

The goal – practical?

Many engineering approaches to mitigate illumination: Quotient images [Riklin-Raviv, Shashua ‘99] [Wang, Li, Wang ‘04]

Nonlinear features SIFT [Lowe ‘99], HOG [Dalal, Triggs ‘05] LBP [Ojala, Pietikäinen, Harwood ’94, Ahonen, Hadid, Pietikainen ‘06]

Total variation minimization [Yin et. Al. ]

Or use heuristic / intuitive physics … [Murase, Nayar ‘96], [Belhumeur , Kriegman ‘98], [Basri, Jacobs ‘03], [Ramamoorthi ‘01], [Basri, Frolova ‘04], [ W. et. al. ‘09], many others…

Wang, Li, Wang ‘04

Ahonen, Hadid, Pietikainen ‘06








Labeled Faces in the Wild results http://vis-www.cs.umass.edu/lfw/results.html








Ex: Face recognition with -regression:

If test image is also of subject , then

Linear subspace model for images of same face under varying illumination:

Subject i training

for some .

Can represent any test image wrt the entire training set as :

Sparse coefficients

Combined training dictionary

Test image

[W., Yang, Ganesh, Sastry, Ma, ‘09]

Underdetermined system of linear equations in unknowns :

Seek the sparsest solution:

Solution is not unique … but

should be sparse: ideally, only supported on images of the same subject

expected to be sparse: occlusion only affects a subset of the pixels

convex relaxation



If is “nice” and the model fits, can make strong statements about the performance of regression.

Geometry of illumination variations

Distant illumination identified with a Riemann integrable, nonnegative function .

Assume a linear sensor response. The image can often be written as

What is the set of possible images of ?

Let , and

The set is a convex cone. [Belhumeur + Kriegman ’98]


Distant illumination identified with a Riemann integrable, nonnegative function .

Assume a linear sensor response. The image can often be written as

What is the set of possible images of ?

.


Ambient cone model. Illumination is the sum of ambient and directional (arbitrary) components: and corresponding cone of possible images:

Extreme rays

Suppose . The can be interpreted as images under point illumination : Set (the ambient image)

We can build a -approximation to using these quantities:

Lemma. Suppose , with Riemann integrable. Then

Approximating a convex cone

Is there any special structure we can use here?

Approximation of general convex bodies in high dimensions is a disaster:

Extreme rays lie on a low dimensional submanifold of !

Theorem 2. [Bronstein, Ivanov ‘76] Let be a convex body. There exists an -approximation to in Hausdorff distance, with

vertices. For the unit sphere, this is optimal to within a constant.

Nine? [Lee + Kriegman ‘05] . Thirty eight? [Wagner, W., Zhou, Ganesh, Ma ‘12]

Sufficient? Good recognition rates in moderate datasets.

How many point illuminations?

Normal

Glasses

Sunglasses

Others

99.4% rec. rate

98.3% rec. rate

81.0% rec. rate

53.5% rec. rate

(116 subjects)

Low-light scenarios are harder.

Ambient level for cone

Non-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are obstructed. Global – Maximum length of a shadow-casting edge boundary.

What is the difficulty here?

vs

vs

Covering numbers

If we define an illumination covering number ,

This implies in general, and for convex objects .

A different norm? Better rates in :

Suggests using regression [W., Yang, Ganesh, Sastry, Ma ‘09]

Prune? Reduce number of samples … but in any , need to get started: In general, cannot be too small due to moving shadow boundaries:

How can we build efficient detectors?

vertices on convex hull of radius- digital circle .

How can we build efficient detectors?

Reduce dimensionality? Seek with much simpler than , but still guaranteed to suffice for the task. What sense of approximation? For guaranteed detection, control

Hausdorff discrepency: Simpler in what sense?

• Efficient storage: specify with only a few real numbers • Efficient evaluation:

Compute in time .

Computing , is dominant cost in state of the art nonnegative least squares algorithms (e.g., [Dhillon et. al. ‘10]).

LR+S captures intuitive physics

Calculations with the Lambertian sphere suggest that sets of images of smooth, near-convex objects should be approximately low-rank: [Basri+Jacobs ’03,Ramamoorthi ‘04].

Cast shadows are often sparse [W., Yang, … Ma, ‘09] , [Candes, Li, Ma, W. ‘11]:

Observations used in photometric stereo [Wu et. Al. ‘11] :

How can we exploit this physical intuition, while guaranteeing quality of approximation?

Cone-preserving dimensionality reduction

A low-rank and sparse decomposition that provably preserves detection performance.

Would like to solve

… but constraint is very complicated. Relax to:

Results: Complexity Reduction

Hausdorff distance bound

Relative complexity

- covering radius. Smaller greater redundancy.

Summary so far …

• Sufficient sample densities: some first results for nonconvex objects.

• Cone-preserving complexity reduction. Both exploit low-dimensional structure …

Learning simple signal models?

with sparse - most of the are zero.

Good model for many types of imagery data, especially if we can learn the dictionary :

=

Data Dictionary

Ambiguities: or ?

Learning the basis of sparsity?

D = L+ S

k column subspaces of

Peculiar geometry:

When is dictionary learning well-posed?

k column subspaces of

k+1 points per subspace

Solution is unique:

How can we learn a good dictionary?

Recently: Supervised variants [Mairal et. al. ‘08], structured dictionaries [Rubenstein et. al. ‘10], highly scalable variants [Mairal et. al. ‘10] … and many, many more…

Alternating directions to minimize sparsity surrogate [Engan et. al., ‘99, Aharon et. al. ’05, Yaghoobi ‘10]

Provable solution in the complete case

Rows of are sparse vectors in a known subspace.

If , then whp. rows of are the sparsest vectors in .

Uniqueness – square dictionaries

[Spielman, Wang, W. ‘11]: Decomposition essentially unique from random observations.

[Aharon, Elad, Bruckstein ‘05]: Decomposition is essentially unique from strategically located observations.

Overcomplete:

Square:

Provable solution in the complete case

If the expected nonzeros per column is smaller than the algorithm succeeds whp: Sample requirement .

Summary and Open Questions

First results on quality of approximation for varying illumination, nonconvex objects. Dedicated cone-preserving dimensionality reduction. Many promising tools for exploiting low-dimensional structure in high-dimensional data:

More general decompositions Convexifications of the dictionary learning problem Many open questions: optimality, practicality, operational constants.

Guaranteed Illumination Models for Nonconvex Objects, Zhang, Mu, Kuo, W., Arxiv ’13

Compressive Principal Component Pursuit, W., Ganesh, Min, Ma, I&I ‘13

Local correctness of -minimization for dictionary learning, Geng, W., Arxiv ‘11

Exact Recovery of Sparsely-Used Dictionaries, Spielman, Wang, W., COLT ’12

Robust Principal Component Analysis? Candes, Li, Ma, W. JACM ’11

Robust Face Recognition via Sparse Representation, W., Yang, Ganesh,, Sastry, Ma, PAMI ‘09

Towards a Practical Automatic Face Recognition … Wagner, W., Ganesh, Zhou, Mobahi, Ma,

PAMI ‘12

Provable Representations for Visual Data Thanks to …

Documents

In Search of Provably Effective Visual Data RepresentationsNon-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are