Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Bilevel Sparse Coding
Jianchao Yang
Adobe Research345 Park Ave, San Jose, CA
Mar 15, 2013
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Outline
1 Introduction
2 Bilevel Sparse CodingThe learning modelThe learning algorithm
3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing
4 Conclusion
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Sparse Modeling
Many types of sensory data, e.g., images and audio, are inhigh-dimensional spaces, but with low-intrinsic dimensions
Sparse representation in some domain.Simple model, effective prior.
Sparse representation: represent data in the most parsimoniousterms
x = Dz,
where x ∈ Rd , D ∈ Rd×K , and ‖z‖0 d .Sparsity: driving factor for broad applications
Compressive sensing, low-rank matrices, etc.Compression, denoising, deblurring, super-resolution, etc.Recognition, subspace clustering, deep learning, etc.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Sparse Modeling
Many types of sensory data, e.g., images and audio, are inhigh-dimensional spaces, but with low-intrinsic dimensions
Sparse representation in some domain.Simple model, effective prior.
Sparse representation: represent data in the most parsimoniousterms
x = Dz,
where x ∈ Rd , D ∈ Rd×K , and ‖z‖0 d .
Sparsity: driving factor for broad applicationsCompressive sensing, low-rank matrices, etc.Compression, denoising, deblurring, super-resolution, etc.Recognition, subspace clustering, deep learning, etc.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Sparse Modeling
Many types of sensory data, e.g., images and audio, are inhigh-dimensional spaces, but with low-intrinsic dimensions
Sparse representation in some domain.Simple model, effective prior.
Sparse representation: represent data in the most parsimoniousterms
x = Dz,
where x ∈ Rd , D ∈ Rd×K , and ‖z‖0 d .Sparsity: driving factor for broad applications
Compressive sensing, low-rank matrices, etc.Compression, denoising, deblurring, super-resolution, etc.Recognition, subspace clustering, deep learning, etc.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Sparse Coding – Quest for Dictionary
Signals are normally mixtures of diverse phenomena; how canwe wisely choose D to perform well on the given signals?
A data driven solution: train adaptive dictionaries from the givensignal instances for sparse representations.Given training data xiN
i=1, the dictionary learning problem, in itsmost popular form, can be formulated as
minD,αiN
i=1
N∑i=1
‖xi − Dαi‖22 + λ‖αi‖1, s.t. ‖D(:, j)‖2 ≤ 1,
where D ∈ Rd×K (d < K ) is an over-complete dictionary.Problem: it only cares about low-level “sparse reconstruction”,not the high-level task!
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Sparse Coding – Quest for Dictionary
Signals are normally mixtures of diverse phenomena; how canwe wisely choose D to perform well on the given signals?A data driven solution: train adaptive dictionaries from the givensignal instances for sparse representations.
Given training data xiNi=1, the dictionary learning problem, in its
most popular form, can be formulated as
minD,αiN
i=1
N∑i=1
‖xi − Dαi‖22 + λ‖αi‖1, s.t. ‖D(:, j)‖2 ≤ 1,
where D ∈ Rd×K (d < K ) is an over-complete dictionary.Problem: it only cares about low-level “sparse reconstruction”,not the high-level task!
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Sparse Coding – Quest for Dictionary
Signals are normally mixtures of diverse phenomena; how canwe wisely choose D to perform well on the given signals?A data driven solution: train adaptive dictionaries from the givensignal instances for sparse representations.Given training data xiN
i=1, the dictionary learning problem, in itsmost popular form, can be formulated as
minD,αiN
i=1
N∑i=1
‖xi − Dαi‖22 + λ‖αi‖1, s.t. ‖D(:, j)‖2 ≤ 1,
where D ∈ Rd×K (d < K ) is an over-complete dictionary.
Problem: it only cares about low-level “sparse reconstruction”,not the high-level task!
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Sparse Coding – Quest for Dictionary
Signals are normally mixtures of diverse phenomena; how canwe wisely choose D to perform well on the given signals?A data driven solution: train adaptive dictionaries from the givensignal instances for sparse representations.Given training data xiN
i=1, the dictionary learning problem, in itsmost popular form, can be formulated as
minD,αiN
i=1
N∑i=1
‖xi − Dαi‖22 + λ‖αi‖1, s.t. ‖D(:, j)‖2 ≤ 1,
where D ∈ Rd×K (d < K ) is an over-complete dictionary.Problem: it only cares about low-level “sparse reconstruction”,not the high-level task!
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Bilevel Sparse Coding – Quest for Dictionary
Many vision and learning tasks can be formulated based onsparse representations
Image feature learningImage super-resolutionCompressive sensingImage classification, etc
We relate the low-level dictionary learning with the high-level tasknaturally with a bilevel formulation.Goal: learn more meaningful sparse representation for the giventask.Advantage: the training procedure is totally consistent with thetesting objective.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Bilevel Sparse Coding – Quest for Dictionary
Many vision and learning tasks can be formulated based onsparse representations
Image feature learningImage super-resolutionCompressive sensingImage classification, etc
We relate the low-level dictionary learning with the high-level tasknaturally with a bilevel formulation.
Goal: learn more meaningful sparse representation for the giventask.Advantage: the training procedure is totally consistent with thetesting objective.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Bilevel Sparse Coding – Quest for Dictionary
Many vision and learning tasks can be formulated based onsparse representations
Image feature learningImage super-resolutionCompressive sensingImage classification, etc
We relate the low-level dictionary learning with the high-level tasknaturally with a bilevel formulation.Goal: learn more meaningful sparse representation for the giventask.
Advantage: the training procedure is totally consistent with thetesting objective.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Bilevel Sparse Coding – Quest for Dictionary
Many vision and learning tasks can be formulated based onsparse representations
Image feature learningImage super-resolutionCompressive sensingImage classification, etc
We relate the low-level dictionary learning with the high-level tasknaturally with a bilevel formulation.Goal: learn more meaningful sparse representation for the giventask.Advantage: the training procedure is totally consistent with thetesting objective.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Bilevel optimization
Mathematical programs with optimization problems in the constraints:
minx∈X,y
F (x , y)
s.t. G(x , y) ≤ 0,y = arg min
yf (x , y),
s.t. g(x , y) ≤ 0.
F and f are the upper-level and lower-level objective functionsrespectively.G and g are the upper-level and lower-level constraintsrespectively.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Bilevel optimization
Simple example: Toll-setting problem on a transportation networkNetwork manager maximizes the revenue raised from tollsNetwork users minimize their travel costs
maxT ,f ,x
∑a∈A
Taxa
s.t. la ≤ Ta ≤ ua,∀a ∈ A
(f , x) ∈ arg minf ′,x ′
∑a∈A
cax′
a +∑a∈A
Tax′
a
s.t. ...
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Bilevel Sparse Coding: Outline
1 Introduction
2 Bilevel Sparse CodingThe learning modelThe learning algorithm
3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing
4 Conclusion
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
The Learning Model
A generic bilevel learning model:
minD,Θ
1N
N∑i=1
L(D, zi ,Θ)
s.t. zi = arg minα‖α‖1, s.t. ‖xi − Dα‖2
2 ≤ ε, ∀i ,
G(Θ) ≤ 0,‖D(:, k)‖2 ≤ 1, ∀k .
L is some smooth cost function defined by the specific task.Θ is the parameter set of a specific model.xiN
i=1 are training samples from the input space X .May involve more than one feature space.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
A Simple Example
Coupled sparse coding: Relate two feature spaces by theircommon sparse representations.
minDx ,Dy
1N
N∑i=1
‖zxi − zy
i ‖22
s.t. zxi = arg min
α‖α‖1, s.t. ‖xi − Dxα‖2
2 ≤ εx , ∀i ,
zyi = arg min
α‖α‖1, s.t. ‖yi − Dyα‖2
2 ≤ εy , ∀i ,
‖Dx (:, k)‖2 ≤ 1, ∀k ,‖Dy (:, k)‖2 ≤ 1, ∀k ,
where xi ,yiNi=1 are randomly sampled from the joint space X × Y.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Bilevel Sparse Coding: Outline
1 Introduction
2 Bilevel Sparse CodingThe learning modelThe learning algorithm
3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing
4 Conclusion
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
A Difficult Problem
Bilevel optimization: mathematical programs with optimizationproblems in the constraints
minD,Θ
1N
N∑i=1
L(D, zi ,Θ)
s.t. zi = arg minα‖α‖1, s.t. ‖xi − Dα‖2
2 ≤ ε, ∀i ,
G(Θ) ≤ 0,‖D(:, k)‖2 ≤ 1, ∀k .
Optimization for D is a bilevel optimization.L is the upper-level objective and `1-norm minimization is thelower-level optimization.Highly nonconvex and highly nonlinear.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Descent Method?
Regard z as an implicit function of D in the lower-level problem,the bilevel program can be viewed solely in terms of theupper-level variable D.
Applying the chain rule, we have, whenever ∇Dz(D) is welldefined
∇DL(D, z(D),Θ) = ∇DL(D, z,Θ) +∇zL(D, z,Θ)∇Dz(D).
Problem: Is the gradient ∇Dz(D) available?
z = arg minα‖α‖1, s.t. ‖x− Dα‖2
2 ≤ ε.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Descent Method?
Regard z as an implicit function of D in the lower-level problem,the bilevel program can be viewed solely in terms of theupper-level variable D.Applying the chain rule, we have, whenever ∇Dz(D) is welldefined
∇DL(D, z(D),Θ) = ∇DL(D, z,Θ) +∇zL(D, z,Θ)∇Dz(D).
Problem: Is the gradient ∇Dz(D) available?
z = arg minα‖α‖1, s.t. ‖x− Dα‖2
2 ≤ ε.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Descent Method?
Regard z as an implicit function of D in the lower-level problem,the bilevel program can be viewed solely in terms of theupper-level variable D.Applying the chain rule, we have, whenever ∇Dz(D) is welldefined
∇DL(D, z(D),Θ) = ∇DL(D, z,Θ) +∇zL(D, z,Θ)∇Dz(D).
Problem: Is the gradient ∇Dz(D) available?
z = arg minα‖α‖1, s.t. ‖x− Dα‖2
2 ≤ ε.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Differentiability
Lasso
The `1-norm minimization problem can be reformulated as the Lassoproblem
z = arg minα‖x− Dα‖2
2 + λ‖α‖1.
Transition Point (Efron et al. 2004)
For a given response vector x, there is a finite sequence of λ’s,λ0 > λ1 > · · · > λK = 0, such that if λ is in the interval of (λm, λm+1),the active set Λ = k : z(k) 6= 0 and sign vector sign(zΛ) areconstant with respect to λ.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Differentiability
TheoremFix any λ > 0, and λ is not a transition point for x, the active set Λ andthe sign vector sign(zΛ) are locally constant with respect to both x andD.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Differentiability
If λ is not a transition point of x, we have the equiangularconditions
a :∂‖x− Dz‖2
2∂z(k)
+ λ sign(z(k)) = 0, for k ∈ Λ,
b :
∣∣∣∣∂‖x− Dz‖22
∂z(k)
∣∣∣∣ < λ, for k 6∈ Λ.
Applying implicit differentiation on the above Eqn. (a), we have
∂zΛ
∂DΛ=(DT
Λ DΛ
)−1(∂DT
Λ x∂DΛ
− ∂DTΛ DΛ
∂DΛzΛ
).
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Differentiability
If λ is not a transition point of x, we have the equiangularconditions
a :∂‖x− Dz‖2
2∂z(k)
+ λ sign(z(k)) = 0, for k ∈ Λ,
b :
∣∣∣∣∂‖x− Dz‖22
∂z(k)
∣∣∣∣ < λ, for k 6∈ Λ.
Applying implicit differentiation on the above Eqn. (a), we have
∂zΛ
∂DΛ=(DT
Λ DΛ
)−1(∂DT
Λ x∂DΛ
− ∂DTΛ DΛ
∂DΛzΛ
).
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Differentiability
Let Ω denotes the nonactive set, we observe thatAs zΛ is only connected with DΛ, a perturbation on DΩ will notchange its value. Therefore, we have
∂zΛ
∂DΩ= 0. (1)
As Λ and sign(zΛ) are constant for a small perturbation of D, zΩ
stays zero, so we have∂zΩ
∂D= 0 (2)
Therefore, the nonzero part of ∇Dz(D) is defined by ∂zΛ/∂DΛ.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Differentiability
Let Ω denotes the nonactive set, we observe thatAs zΛ is only connected with DΛ, a perturbation on DΩ will notchange its value. Therefore, we have
∂zΛ
∂DΩ= 0. (1)
As Λ and sign(zΛ) are constant for a small perturbation of D, zΩ
stays zero, so we have∂zΩ
∂D= 0 (2)
Therefore, the nonzero part of ∇Dz(D) is defined by ∂zΛ/∂DΛ.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Stochastic Gradient Descent
Given ∇Dz(D), ∇DL can be evaluated. Applying stochasticgradient descent, we have
Dn+1 = Dn − rn∂Ln
∂D/‖∂Ln
∂D‖2
rn =r0
(n/N + 1)p ,
where p controls the shrinkage rate the step size.
Project the updated dictionary onto the unit ball.
The complete optimization procedure alternatively optimize over Dand Θ.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
The learning modelThe learning algorithm
Stochastic Gradient Descent
Given ∇Dz(D), ∇DL can be evaluated. Applying stochasticgradient descent, we have
Dn+1 = Dn − rn∂Ln
∂D/‖∂Ln
∂D‖2
rn =r0
(n/N + 1)p ,
where p controls the shrinkage rate the step size.Project the updated dictionary onto the unit ball.
The complete optimization procedure alternatively optimize over Dand Θ.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Bilevel Sparse Coding: Outline
1 Introduction
2 Bilevel Sparse CodingThe learning modelThe learning algorithm
3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing
4 Conclusion
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Single Frame Super-resolution
Problem: Given a single low-resolution input, and a set of pairs(high- and low-resolution) of training patches sampled fromsimilar images, reconstruct a high-resolution version of the input.
ApplicationsPhoto zooming (e.g., Photoshop, Genuine Fractal)Photo printingVideo standard conversion, etc
Difficulty: single-image super-resolution is an extremelyill-posed problem.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Super-resolution via Sparse Recovery
High-resolution patches have sparse representations in terms ofsome over-complete dictionary
x = Dhz0
where x ∈ Rm, Dh ∈ Rm×K , and ‖z0‖0 m
We do not observe the high-resolution patch x, but itslow-resolution version y ∈ Rn
y = Lx = LDhz0 = Dlz0
L is the sampling matrix (blurring and downsampling)y is the n linear measurements of the sparse coefficients z0
Sparse recovery? If we can obtain z0 from y = Dlz(underdetermined linear system), we can recover x as Dhz0.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Super-resolution via Sparse Recovery
High-resolution patches have sparse representations in terms ofsome over-complete dictionary
x = Dhz0
where x ∈ Rm, Dh ∈ Rm×K , and ‖z0‖0 mWe do not observe the high-resolution patch x, but itslow-resolution version y ∈ Rn
y = Lx = LDhz0 = Dlz0
L is the sampling matrix (blurring and downsampling)y is the n linear measurements of the sparse coefficients z0
Sparse recovery? If we can obtain z0 from y = Dlz(underdetermined linear system), we can recover x as Dhz0.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Super-resolution via Sparse Recovery
High-resolution patches have sparse representations in terms ofsome over-complete dictionary
x = Dhz0
where x ∈ Rm, Dh ∈ Rm×K , and ‖z0‖0 mWe do not observe the high-resolution patch x, but itslow-resolution version y ∈ Rn
y = Lx = LDhz0 = Dlz0
L is the sampling matrix (blurring and downsampling)y is the n linear measurements of the sparse coefficients z0
Sparse recovery? If we can obtain z0 from y = Dlz(underdetermined linear system), we can recover x as Dhz0.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Super-resolution via Sparse Recovery
Assume we have the coupled dictionaries Dh and Dl .
Input: low-resolution image Y.
Find sparse solution for each patch yp of Y by
z0 = arg minz‖Dlz− yp‖2
2 + λ‖z‖1.
Recover the corresponding high-resolution image patch asxp = Dhz0.
How to train Dl and Dh for good recovery?
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Super-resolution via Sparse Recovery
Assume we have the coupled dictionaries Dh and Dl .
Input: low-resolution image Y.Find sparse solution for each patch yp of Y by
z0 = arg minz‖Dlz− yp‖2
2 + λ‖z‖1.
Recover the corresponding high-resolution image patch asxp = Dhz0.
How to train Dl and Dh for good recovery?
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Super-resolution via Sparse Recovery
Assume we have the coupled dictionaries Dh and Dl .
Input: low-resolution image Y.Find sparse solution for each patch yp of Y by
z0 = arg minz‖Dlz− yp‖2
2 + λ‖z‖1.
Recover the corresponding high-resolution image patch asxp = Dhz0.
How to train Dl and Dh for good recovery?
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Super-resolution via Sparse Recovery
Assume we have the coupled dictionaries Dh and Dl .
Input: low-resolution image Y.Find sparse solution for each patch yp of Y by
z0 = arg minz‖Dlz− yp‖2
2 + λ‖z‖1.
Recover the corresponding high-resolution image patch asxp = Dhz0.
How to train Dl and Dh for good recovery?
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Joint Dictionary Training – Previous Approach
Our previous solution.Randomly sample high- and low-resolution image patch pairsxi ,yiN
i=1 from the training data.Learn Dh, Dl jointly:
minDh,Dl ,zi
N∑i=1
‖xi − Dhzi‖22 + ‖yi − Dlzi‖2
2 + λ‖zi‖1,
s.t. ‖Dh(:, k)‖2 ≤ 1, ‖Dl(:, k)‖2 ≤ 1
However, ...
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Joint Dictionary Training – Previous Approach
Our previous solution.Randomly sample high- and low-resolution image patch pairsxi ,yiN
i=1 from the training data.Learn Dh, Dl jointly:
minDh,Dl ,zi
N∑i=1
‖xi − Dhzi‖22 + ‖yi − Dlzi‖2
2 + λ‖zi‖1,
s.t. ‖Dh(:, k)‖2 ≤ 1, ‖Dl(:, k)‖2 ≤ 1
However, ...
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Joint Dictionary Training – Problem
In training, we have
minDh,Dl ,zi
N∑i=1
‖xi − Dhzi‖22 + ‖yi − Dlzi‖2
2 + λ‖zi‖1
In testing, we only have the low-resolution patch yi ,
minzi‖xi − Dhzi‖2
2+‖yi − Dlzi‖22 + λ‖zi‖1,
and therefore, good reconstruction of xi is not guaranteed.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Joint Dictionary Training – Problem
In training, we have
minDh,Dl ,zi
N∑i=1
‖xi − Dhzi‖22 + ‖yi − Dlzi‖2
2 + λ‖zi‖1
In testing, we only have the low-resolution patch yi ,
minzi‖xi − Dhzi‖2
2+‖yi − Dlzi‖22 + λ‖zi‖1,
and therefore, good reconstruction of xi is not guaranteed.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Bilevel Formulation
Goal: Learn Dh and Dl , such that the sparse representation z ofy in terms of Dl can well reconstruct x with Dh.
Given high- and low-resolution training patch pairs xi ,yiNi=1, the
learning model is formulated as
minDh,Dl
1N
N∑i=1
‖Dhzi − xi‖22
s.t. zi = arg minα‖α‖1, s.t. ‖yi − Dlα‖2
2 ≤ ε
‖Dl (:, k)‖2 ≤ 1,‖Dh(:, k)‖2 ≤ 1,
The training process is completely consist with testing.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Bilevel Formulation
Goal: Learn Dh and Dl , such that the sparse representation z ofy in terms of Dl can well reconstruct x with Dh.Given high- and low-resolution training patch pairs xi ,yiN
i=1, thelearning model is formulated as
minDh,Dl
1N
N∑i=1
‖Dhzi − xi‖22
s.t. zi = arg minα‖α‖1, s.t. ‖yi − Dlα‖2
2 ≤ ε
‖Dl (:, k)‖2 ≤ 1,‖Dh(:, k)‖2 ≤ 1,
The training process is completely consist with testing.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Bilevel Formulation
Goal: Learn Dh and Dl , such that the sparse representation z ofy in terms of Dl can well reconstruct x with Dh.Given high- and low-resolution training patch pairs xi ,yiN
i=1, thelearning model is formulated as
minDh,Dl
1N
N∑i=1
‖Dhzi − xi‖22
s.t. zi = arg minα‖α‖1, s.t. ‖yi − Dlα‖2
2 ≤ ε
‖Dl (:, k)‖2 ≤ 1,‖Dh(:, k)‖2 ≤ 1,
The training process is completely consist with testing.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Results
Setting: 100,000 high- and low-resolution 5× 5 image patch pairsare sampled for training and 100,000 for testing. Dh and Dl areinitialized from joint dictionary training. The learning algorithmconverges in 5 iterations.
21.61% 19.60% 21.89% 18.91% 20.55%
17.43% 15.75% 17.92% 15.69% 14.70%
17.15% 16.96% 19.95% 17.57% 15.99%
16.41% 17.78% 18.30% 16.80% 15.82%
20.48% 14.68% 15.52% 14.64% 20.51%
1 2 3 4 5
1
2
3
4
5
Pixel-wise MSE reduction compared with joint dictionary training
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
SR Results
Visual comparison: Top: joint dictionary training; bottom: bilevelsparse coding.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Practical Implementation
Learn fast sparse codingapproximations with aneural network.Selective patchprocessing.Takes 5s to upscale animage from 200× 200 to800× 800 on a single core3 GHz with 4G RAM.One of the fastest SRalgorithms.
Input
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Practical Implementation
Learn fast sparse codingapproximations with aneural network.Selective patchprocessing.Takes 5s to upscale animage from 200× 200 to800× 800 on a single core3 GHz with 4G RAM.One of the fastest SRalgorithms.
Bicubic
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Practical Implementation
Learn fast sparse codingapproximations with aneural network.Selective patchprocessing.Takes 6s to upscale animage from 200× 200 to800× 800 on a single core3 GHz with 4G RAM.One of the fastest SRalgorithms.
Ours
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Bilevel Sparse Coding: Outline
1 Introduction
2 Bilevel Sparse CodingThe learning modelThe learning algorithm
3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing
4 Conclusion
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Feature Representation by Pooling Sparse Codes
Fig. The image feature extraction diagram.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Feature Representation by Pooling Sparse Codes
A simple two-layer network.
Coding: VQ, soft assignment, LLC,sparse coding, linear filtering.
Pooling: average, energy, max, log,`p.
Works well on diverse recognitionbenchmarks: object, scene, action,face, digit, gender, expression, ageestimation, and so on.
Key component of the winner systemfor PASCAL09 on image recognition. Image feature extraction diagram
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
The Feature Extraction Algorithm
1 Represent image X as sets of local descriptors in a spatialpyramid
X =[Y0
11,Y111,Y
112, ...,Y
244],
2 Given dictionary D, encode the local descriptors into sparsecodes by
Zsij = arg min
A‖Ys
ij − DA‖22 + λ‖A‖1,
and we obtainS =
[Z0
11, Z111, Z
212, ..., Z
244
]3 Max pooling over each set of sparse codes and concatenate
them
β =2⋃
s=0
2s⋃i,j=1
[βs
ij] , where βs
ij = max(|Zs
ij |).
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Unsupervised Dictionary Learning
Randomly sample a set of local descriptors xiNi=1 from the training
set, use current sparse coding technique to learn a dictionary D thatcan sparsely represent the data.
minD,αiN
i=1
n∑i=1
‖xi − Dαi‖22 + λ‖αi‖1,
s.t. ‖D(:, k)‖2 ≤ 1,
Optimization is performed in an alternating fashion: fix D,optimize αiN
i=1; and fix αiNi=1, and optimize D.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Supervised Dictionary Learning
The unsupervised dictionary learning is good for reconstruction,not necessarily effective for classification.
Training data with image labels (Xi , yi )Ni=1.
Train the dictionary together with the classifier
minD,w
N∑
i=1
`(yi , f (βi ,w)) + γ‖w‖22
,
s.t. βi = pooling(Zi)
Zi = arg minA‖Xi − DA‖2
2 + λ‖A‖1
‖D(:, k)‖2 ≤ 1,∀k ,
where `(·) is a loss function and f (·,w) is the linear predictionmodel.
Optimization for w is training the classifier.Optimization for D is a bilevel program.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Supervised Dictionary Learning
The unsupervised dictionary learning is good for reconstruction,not necessarily effective for classification.Training data with image labels (Xi , yi )N
i=1.
Train the dictionary together with the classifier
minD,w
N∑
i=1
`(yi , f (βi ,w)) + γ‖w‖22
,
s.t. βi = pooling(Zi)
Zi = arg minA‖Xi − DA‖2
2 + λ‖A‖1
‖D(:, k)‖2 ≤ 1,∀k ,
where `(·) is a loss function and f (·,w) is the linear predictionmodel.
Optimization for w is training the classifier.Optimization for D is a bilevel program.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Supervised Dictionary Learning
The unsupervised dictionary learning is good for reconstruction,not necessarily effective for classification.Training data with image labels (Xi , yi )N
i=1.Train the dictionary together with the classifier
minD,w
N∑
i=1
`(yi , f (βi ,w)) + γ‖w‖22
,
s.t. βi = pooling(Zi)
Zi = arg minA‖Xi − DA‖2
2 + λ‖A‖1
‖D(:, k)‖2 ≤ 1,∀k ,
where `(·) is a loss function and f (·,w) is the linear predictionmodel.
Optimization for w is training the classifier.Optimization for D is a bilevel program.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Face recognition
CMU Multi-PIE DatabaseThis dataset contains 337 subjectsacross simultaneous variations inpose, expression, and illumination.We use session 1 as training, andthe rest sessions 2-4 for testing.The dataset is challenging due tothe large number of subjects, anddue to natural variations in subjectappearance over time.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Face recognition
Face recognition error (%) on large-scale Multi-PIE.
Rec. Rates Session 2 Session 3 Session 4LDA 50.6 55.7 52.1NN 32.7 33.8 37.2NS 22.4 25.7 26.6SR 8.6 9.7 9.8U-SC 5.4 9.0 7.5S-SC 4.8 6.6 4.9Improvements 11.1% 26.7% 34.7%
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Gender Recognition
FRGC 2.0The dataset contains 568 individuals, totally 14714 face images undervarious lighting conditions and backgrounds. 11700 images from 451randomly chosen individuals serve as the training set, and 3014images from the rest 114 persons are modeled as the testing set.
Classification Error (%)
Algorithms SVM (RBF) CNN U-SC S-SC ImprovementsError Rate 8.6 5.9 6.8 5.3 22.1%
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Hand Written Digit Recognition
MNIST: The dataset consists of 70,000 handwritten digits, of which60,000 are selected for training and the rest 10,000 for testing.
Algorithms Error RateSVM (RBF) 1.41L1 sparse coding 2.02Local coordinate coding 1.90Deep Belief Network 1.20CNN 0.82U-SC 0.98S-SC 0.84Improvements 14.3%
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Bilevel Sparse Coding: Outline
1 Introduction
2 Bilevel Sparse CodingThe learning modelThe learning algorithm
3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing
4 Conclusion
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Formulation
Let x be the original signal, Φ be the sampling matrix, andy = Φx be the linear measurements. Compressive sensingrecovery is done by
z = minα‖α‖1, s.t. y = ΦDxα
x =Dxz
Dx is important for the recovery quality.With the training samples xiN
i=1, learn Dx by directly minimizingthe compressive sensing recovery error:
minDx
1N
N∑i=1
‖xi − Dxzi‖22
s.t. yi = Φxi , Dy = ΦDx
zi = arg minα‖α‖1, s.t. ‖yi − Dyα‖2
2 ≤ ε
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Formulation
Let x be the original signal, Φ be the sampling matrix, andy = Φx be the linear measurements. Compressive sensingrecovery is done by
z = minα‖α‖1, s.t. y = ΦDxα
x =Dxz
Dx is important for the recovery quality.
With the training samples xiNi=1, learn Dx by directly minimizing
the compressive sensing recovery error:
minDx
1N
N∑i=1
‖xi − Dxzi‖22
s.t. yi = Φxi , Dy = ΦDx
zi = arg minα‖α‖1, s.t. ‖yi − Dyα‖2
2 ≤ ε
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
Formulation
Let x be the original signal, Φ be the sampling matrix, andy = Φx be the linear measurements. Compressive sensingrecovery is done by
z = minα‖α‖1, s.t. y = ΦDxα
x =Dxz
Dx is important for the recovery quality.With the training samples xiN
i=1, learn Dx by directly minimizingthe compressive sensing recovery error:
minDx
1N
N∑i=1
‖xi − Dxzi‖22
s.t. yi = Φxi , Dy = ΦDx
zi = arg minα‖α‖1, s.t. ‖yi − Dyα‖2
2 ≤ ε
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
CS Results
Settings: 10,000 image patches of 16× 16 are randomly sampledfor training and 5,000 for testing from medical images. Haar Waveletbasis is used as our baseline and initialization. Bernouli randommatrix is used as the sampling matrix.
0 5 10 15 20 250.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6x 10
5
Iteration
Co
st
Objective value vs. iteration number for 10%sample rate.
0.1 0.15 0.2 0.25 0.3
18
20
22
24
26
28
Sampling Rate
PS
NR
Learned
Wavelet
Recovery accuracy comparison on the test imagepatches in PSNR.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
CS Results
Settings: 10,000 image patches of 16× 16 are randomly sampledfor training and 5,000 for testing from medical images. Haar Waveletbasis is used as our baseline and initialization. Bernouli randommatrix is used as the sampling matrix.
0 5 10 15 20 250.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6x 10
5
Iteration
Co
st
Objective value vs. iteration number for 10%sample rate.
0.1 0.15 0.2 0.25 0.3
18
20
22
24
26
28
Sampling Rate
PS
NR
Learned
Wavelet
Recovery accuracy comparison on the test imagepatches in PSNR.
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing
CS Results
Image recovery on the “bone” image with 20% measurements
Ground truth Wavelet(22.8 dB) Ours (27.6 dB)
Jianchao Yang Bilevel Sparse Coding
IntroductionBilevel Sparse Coding
Sparse Modeling ApplicationsConclusion
Conclusion
Learning the meaningful representation is critical for manyapplicationsMany sparse coding based applications can be formulated as abilevel programBilevel programs are extremely useful in many hierarchicalmodelsMore applications in computer vision and machine learning?E.g., model selection.
Jianchao Yang Bilevel Sparse Coding