Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
John Wright
Electrical Engineering
Columbia University
In Search of Provably Effective Visual Data Representations
??
Images
Videos
Web data
> 1M pixels
> 1B voxels
> 10B+ websites
Datasets are massive, high-dimensional…
??
Images
Videos
Web data
> 1M pixels
> 1B voxels
> 10B+ websites
… intrinsic structures are low-dimensional
How can we exploit low-dimensional structure in high-dimensional data?
Recognition Inpainting Denoising
Compression Transmission Stabilization Repair
Indexing Ranking Search Collaborative filtering…
Images
Videos
Web data
Good solutions enable many applications
??
Real application data often contain missing observations, corruption or even malicious errors and noise. Classical algorithms (e.g., least squares, PCA) break down …
But … it is not easy…
Images
Videos
Web data
??
This talk
Images
Videos
Web data
??
How do we develop provably correct and efficient algorithms for recovering low-dimensional structure from corrupted high-dimensional observations?
Given with low-rank, sparse, recover .
Numerous approaches to robust PCA in the literature:
• Multivariate trimming [Gnanadeskian + Kettering ’72] • Random sampling [Fischler + Bolles ’81] • Alternating minimization [Ke + Kanade ’03] • Influence functions [de la Torre + Black ’03]
•
Can we give an efficient, provably correct algorithm?
… … …
Formulation – Robust PCA?
Why is the problem hard?
+ + or
Some very sparse matrices are also low-rank:
Certain sparse error patterns make recovering impossible:
+ =
Can we recover that are incoherent with the standard basis?
Can we correct whose support is not adversarial?
Singular vectors of not too spiky:
Uniform model on error support, signs and magnitudes arbitrary:
not too cross-correlated:
Can we recover that are incoherent with the standard basis from almost all errors ?
Incoherence condition on singular vectors, singular values arbitrary:
Incoherence condition: [Candès + Recht ‘08]
When is there hope? (In) coherence
Naïve optimization approach
Look for a low-rank that agrees with the data up to some sparse error :
… and how should we solve it?
Naïve optimization approach
Look for a low-rank that agrees with the data up to some sparse error :
Convex relaxation
Nuclear norm heuristic: [Fazel, Hindi, Boyd ‘01], [Recht, Fazel, Parillo ‘08]. Convex surrogate for LR+S: [Chandrasekharan, Sanghavi, Parillo, Wilsky ‘11].
… and how should we solve it?
“Convex optimization recovers matrices of rank from errors corrupting entries”
[Candès, Li, Ma, and W., ’09].
Theory – Correct recovery
Video = Low-rank appx. + Sparse error
Static camera surveillance video
200 frames, 144 x 172 pixels,
Significant foreground motion
…
…
RPCA
Applications – Background modeling from video
Real time (recursive) versions: Vaswani et. al., Balzano et. al. ‘12.
Peng, Ganesh, W., Ma. CVPR 2010
Application – Faces from the internet
Input: faces detected by a face detector ( )
Average
Peng, Ganesh, W., Ma. CVPR 2010
Application – Faces detected
Output: aligned faces ( )
Average
Peng, Ganesh, W., Ma. CVPR 2010
Application – Faces aligned
Output: clean, low-rank faces ( )
Average
Peng, Ganesh, W., Ma. CVPR 2010
Application – Faces repaired and cleaned
Gloria Macapagal Arroyo
Jennifer Capriati
Laura Bush
Serena Williams
Barack Obama
Ariel Sharon
Arnold Schwarzenegger
Colin Powell
Donald Rumsfeld
George W Bush
Gerhard Schroeder
Hugo Chavez
Jacques Chirac
Jean Chretien
John Ashcroft
Junichiro Koizumi
Lleyton Hewitt
Luiz Inacio Lula da Silva
Tony Blair
Vladimir Putin
Average face before alignment & repairing
Peng, Ganesh, W., Ma. CVPR 2010
Application – Celebrities from the internet
Gloria Macapagal Arroyo
Jennifer Capriati
Laura Bush
Serena Williams
Barack Obama
Ariel Sharon
Arnold Schwarzenegger
Colin Powell
Donald Rumsfeld
George W Bush
Gerhard Schroeder
Hugo Chavez
Jacques Chirac
Jean Chretien
John Ashcroft
Junichiro Koizumi
Lleyton Hewitt
Luiz Inacio Lula da Silva
Tony Blair
Vladimir Putin
Average face after alignment & repair
Peng, Ganesh, W., Ma. CVPR 2010
Application – Celebrities from the internet
BIG PICTURE – Parallelism of Sparsity and Low-Rank
Degeneracy of
Measure
Convex Surrogate
Compressed Sensing
Error Correction
Domain Transform
Mixed Structures
Sparse Vector
individual signal
L0 norm
L1 norm
Low-Rank Matrix
correlated signals
Nuclear norm
A SUITE OF POWERFUL REGULARIZERS
• [Zhou et. al. ‘09] Spatially contiguous sparse errors via MRF
• [Bach ’10] – structured relaxations from submodular functions
• [Negahban+Yu+Wainwright ’10] – geometric analysis of recovery
• [Becker+Candès+Grant ’10] – algorithmic templates
• [Xu+Caramanis+Sanghavi ‘11] column sparse errors L2,1 norm
• [Recht+Parillo+Chandrasekaran+Wilsky ’11] – compressive sensing of various structures
• [Candes+Recht ’11] – compressive sensing of decomposable structures
• [McCoy+Tropp’11] – decomposition of sparse and low-rank structures
• [Wright+Ganesh+Min+Ma, ISIT’12] – superposition of decomposable structures
For robust recovery of a family of low-dimensional structures:
Take home message: Let the data and application tell you the structure…
Can we provably detect object instances under varying lighting?
Conceptual Problem:
Given (information about) an object , can we provably detect or recognize the object from a new image ?
What information do we need?
What computational complexity?
What algorithmic performance guarantee?
The goal – practical?
Many engineering approaches to mitigate illumination: Quotient images [Riklin-Raviv, Shashua ‘99] [Wang, Li, Wang ‘04]
Nonlinear features SIFT [Lowe ‘99], HOG [Dalal, Triggs ‘05] LBP [Ojala, Pietikäinen, Harwood ’94, Ahonen, Hadid, Pietikainen ‘06]
Total variation minimization [Yin et. Al. ]
Or use heuristic / intuitive physics … [Murase, Nayar ‘96], [Belhumeur , Kriegman ‘98], [Basri, Jacobs ‘03], [Ramamoorthi ‘01], [Basri, Frolova ‘04], [ W. et. al. ‘09], many others…
Wang, Li, Wang ‘04
Ahonen, Hadid, Pietikainen ‘06
The goal – practical?
Many engineering approaches to mitigate illumination: Quotient images [Riklin-Raviv, Shashua ‘99] [Wang, Li, Wang ‘04]
Nonlinear features SIFT [Lowe ‘99], HOG [Dalal, Triggs ‘05] LBP [Ojala, Pietikäinen, Harwood ’94, Ahonen, Hadid, Pietikainen ‘06]
Total variation minimization [Yin et. Al. ]
Or use heuristic / intuitive physics … [Murase, Nayar ‘96], [Belhumeur , Kriegman ‘98], [Basri, Jacobs ‘03], [Ramamoorthi ‘01], [Basri, Frolova ‘04], [ W. et. al. ‘09], many others…
Wang, Li, Wang ‘04
Ahonen, Hadid, Pietikainen ‘06
Labeled Faces in the Wild results http://vis-www.cs.umass.edu/lfw/results.html
The goal – practical?
Many engineering approaches to mitigate illumination: Quotient images [Riklin-Raviv, Shashua ‘99] [Wang, Li, Wang ‘04]
Nonlinear features SIFT [Lowe ‘99], HOG [Dalal, Triggs ‘05] LBP [Ojala, Pietikäinen, Harwood ’94, Ahonen, Hadid, Pietikainen ‘06]
Total variation minimization [Yin et. Al. ]
Or use heuristic / intuitive physics … [Murase, Nayar ‘96], [Belhumeur , Kriegman ‘98], [Basri, Jacobs ‘03], [Ramamoorthi ‘01], [Basri, Frolova ‘04], [ W. et. al. ‘09], many others…
Wang, Li, Wang ‘04
Ahonen, Hadid, Pietikainen ‘06
Ex: Face recognition with -regression:
If test image is also of subject , then
Linear subspace model for images of same face under varying illumination:
Subject i training
for some .
Can represent any test image wrt the entire training set as :
Sparse coefficients
Combined training dictionary
Test image
[W., Yang, Ganesh, Sastry, Ma, ‘09]
Underdetermined system of linear equations in unknowns :
Seek the sparsest solution:
Solution is not unique … but
should be sparse: ideally, only supported on images of the same subject
expected to be sparse: occlusion only affects a subset of the pixels
convex relaxation
Ex: Face recognition with -regression:
Ex: Face recognition with -regression:
If is “nice” and the model fits, can make strong statements about the performance of regression.
Geometry of illumination variations
Distant illumination identified with a Riemann integrable, nonnegative function .
Assume a linear sensor response. The image can often be written as
What is the set of possible images of ?
Let , and
The set is a convex cone. [Belhumeur + Kriegman ’98]
Geometry of illumination variations
Distant illumination identified with a Riemann integrable, nonnegative function .
Assume a linear sensor response. The image can often be written as
What is the set of possible images of ?
.
Geometry of illumination variations
Ambient cone model. Illumination is the sum of ambient and directional (arbitrary) components: and corresponding cone of possible images:
Extreme rays
Suppose . The can be interpreted as images under point illumination : Set (the ambient image)
We can build a -approximation to using these quantities:
Lemma. Suppose , with Riemann integrable. Then
Approximating a convex cone
Is there any special structure we can use here?
Approximation of general convex bodies in high dimensions is a disaster:
Extreme rays lie on a low dimensional submanifold of !
Theorem 2. [Bronstein, Ivanov ‘76] Let be a convex body. There exists an -approximation to in Hausdorff distance, with
vertices. For the unit sphere, this is optimal to within a constant.
Nine? [Lee + Kriegman ‘05] . Thirty eight? [Wagner, W., Zhou, Ganesh, Ma ‘12]
Sufficient? Good recognition rates in moderate datasets.
How many point illuminations?
Normal
Glasses
Sunglasses
Others
99.4% rec. rate
98.3% rec. rate
81.0% rec. rate
53.5% rec. rate
(116 subjects)
Low-light scenarios are harder.
Ambient level for cone
Non-convex objects are harder. Two notions of convexity defect: Local – Maximum fraction of viewing directions that are obstructed. Global – Maximum length of a shadow-casting edge boundary.
What is the difficulty here?
vs
vs
Covering numbers
If we define an illumination covering number ,
This implies in general, and for convex objects .
A different norm? Better rates in :
Suggests using regression [W., Yang, Ganesh, Sastry, Ma ‘09]
Prune? Reduce number of samples … but in any , need to get started: In general, cannot be too small due to moving shadow boundaries:
How can we build efficient detectors?
vertices on convex hull of radius- digital circle .
How can we build efficient detectors?
Reduce dimensionality? Seek with much simpler than , but still guaranteed to suffice for the task. What sense of approximation? For guaranteed detection, control
Hausdorff discrepency: Simpler in what sense?
• Efficient storage: specify with only a few real numbers • Efficient evaluation:
Compute in time .
Computing , is dominant cost in state of the art nonnegative least squares algorithms (e.g., [Dhillon et. al. ‘10]).
LR+S captures intuitive physics
Calculations with the Lambertian sphere suggest that sets of images of smooth, near-convex objects should be approximately low-rank: [Basri+Jacobs ’03,Ramamoorthi ‘04].
Cast shadows are often sparse [W., Yang, … Ma, ‘09] , [Candes, Li, Ma, W. ‘11]:
Observations used in photometric stereo [Wu et. Al. ‘11] :
How can we exploit this physical intuition, while guaranteeing quality of approximation?
Cone-preserving dimensionality reduction
A low-rank and sparse decomposition that provably preserves detection performance.
Would like to solve
… but constraint is very complicated. Relax to:
Results: Complexity Reduction
Hausdorff distance bound
Relative complexity
- covering radius. Smaller greater redundancy.
Summary so far …
• Sufficient sample densities: some first results for nonconvex objects.
• Cone-preserving complexity reduction. Both exploit low-dimensional structure …
Learning simple signal models?
with sparse - most of the are zero.
Good model for many types of imagery data, especially if we can learn the dictionary :
=
Data Dictionary
Ambiguities: or ?
Learning the basis of sparsity?
D = L+ S
k column subspaces of
Peculiar geometry:
When is dictionary learning well-posed?
k column subspaces of
k+1 points per subspace
Solution is unique:
How can we learn a good dictionary?
Recently: Supervised variants [Mairal et. al. ‘08], structured dictionaries [Rubenstein et. al. ‘10], highly scalable variants [Mairal et. al. ‘10] … and many, many more…
Alternating directions to minimize sparsity surrogate [Engan et. al., ‘99, Aharon et. al. ’05, Yaghoobi ‘10]
Provable solution in the complete case
Rows of are sparse vectors in a known subspace.
If , then whp. rows of are the sparsest vectors in .
Uniqueness – square dictionaries
[Spielman, Wang, W. ‘11]: Decomposition essentially unique from random observations.
[Aharon, Elad, Bruckstein ‘05]: Decomposition is essentially unique from strategically located observations.
Overcomplete:
Square:
Provable solution in the complete case
If the expected nonzeros per column is smaller than the algorithm succeeds whp: Sample requirement .
Summary and Open Questions
First results on quality of approximation for varying illumination, nonconvex objects. Dedicated cone-preserving dimensionality reduction. Many promising tools for exploiting low-dimensional structure in high-dimensional data:
More general decompositions Convexifications of the dictionary learning problem Many open questions: optimality, practicality, operational constants.
Guaranteed Illumination Models for Nonconvex Objects, Zhang, Mu, Kuo, W., Arxiv ’13
Compressive Principal Component Pursuit, W., Ganesh, Min, Ma, I&I ‘13
Local correctness of -minimization for dictionary learning, Geng, W., Arxiv ‘11
Exact Recovery of Sparsely-Used Dictionaries, Spielman, Wang, W., COLT ’12
Robust Principal Component Analysis? Candes, Li, Ma, W. JACM ’11
Robust Face Recognition via Sparse Representation, W., Yang, Ganesh,, Sastry, Ma, PAMI ‘09
Towards a Practical Automatic Face Recognition … Wagner, W., Ganesh, Zhou, Mobahi, Ma,
PAMI ‘12
Provable Representations for Visual Data Thanks to …