72
Introduction Bilevel Sparse Coding Sparse Modeling Applications Conclusion Bilevel Sparse Coding Jianchao Yang Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Jianchao Yang Bilevel Sparse Coding

Bilevel Sparse Coding - University of Illinois at Urbana-Champaign

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Bilevel Sparse Coding

Jianchao Yang

Adobe Research345 Park Ave, San Jose, CA

Mar 15, 2013

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Outline

1 Introduction

2 Bilevel Sparse CodingThe learning modelThe learning algorithm

3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing

4 Conclusion

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Sparse Modeling

Many types of sensory data, e.g., images and audio, are inhigh-dimensional spaces, but with low-intrinsic dimensions

Sparse representation in some domain.Simple model, effective prior.

Sparse representation: represent data in the most parsimoniousterms

x = Dz,

where x ∈ Rd , D ∈ Rd×K , and ‖z‖0 d .Sparsity: driving factor for broad applications

Compressive sensing, low-rank matrices, etc.Compression, denoising, deblurring, super-resolution, etc.Recognition, subspace clustering, deep learning, etc.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Sparse Modeling

Many types of sensory data, e.g., images and audio, are inhigh-dimensional spaces, but with low-intrinsic dimensions

Sparse representation in some domain.Simple model, effective prior.

Sparse representation: represent data in the most parsimoniousterms

x = Dz,

where x ∈ Rd , D ∈ Rd×K , and ‖z‖0 d .

Sparsity: driving factor for broad applicationsCompressive sensing, low-rank matrices, etc.Compression, denoising, deblurring, super-resolution, etc.Recognition, subspace clustering, deep learning, etc.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Sparse Modeling

Many types of sensory data, e.g., images and audio, are inhigh-dimensional spaces, but with low-intrinsic dimensions

Sparse representation in some domain.Simple model, effective prior.

Sparse representation: represent data in the most parsimoniousterms

x = Dz,

where x ∈ Rd , D ∈ Rd×K , and ‖z‖0 d .Sparsity: driving factor for broad applications

Compressive sensing, low-rank matrices, etc.Compression, denoising, deblurring, super-resolution, etc.Recognition, subspace clustering, deep learning, etc.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Sparse Coding – Quest for Dictionary

Signals are normally mixtures of diverse phenomena; how canwe wisely choose D to perform well on the given signals?

A data driven solution: train adaptive dictionaries from the givensignal instances for sparse representations.Given training data xiN

i=1, the dictionary learning problem, in itsmost popular form, can be formulated as

minD,αiN

i=1

N∑i=1

‖xi − Dαi‖22 + λ‖αi‖1, s.t. ‖D(:, j)‖2 ≤ 1,

where D ∈ Rd×K (d < K ) is an over-complete dictionary.Problem: it only cares about low-level “sparse reconstruction”,not the high-level task!

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Sparse Coding – Quest for Dictionary

Signals are normally mixtures of diverse phenomena; how canwe wisely choose D to perform well on the given signals?A data driven solution: train adaptive dictionaries from the givensignal instances for sparse representations.

Given training data xiNi=1, the dictionary learning problem, in its

most popular form, can be formulated as

minD,αiN

i=1

N∑i=1

‖xi − Dαi‖22 + λ‖αi‖1, s.t. ‖D(:, j)‖2 ≤ 1,

where D ∈ Rd×K (d < K ) is an over-complete dictionary.Problem: it only cares about low-level “sparse reconstruction”,not the high-level task!

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Sparse Coding – Quest for Dictionary

Signals are normally mixtures of diverse phenomena; how canwe wisely choose D to perform well on the given signals?A data driven solution: train adaptive dictionaries from the givensignal instances for sparse representations.Given training data xiN

i=1, the dictionary learning problem, in itsmost popular form, can be formulated as

minD,αiN

i=1

N∑i=1

‖xi − Dαi‖22 + λ‖αi‖1, s.t. ‖D(:, j)‖2 ≤ 1,

where D ∈ Rd×K (d < K ) is an over-complete dictionary.

Problem: it only cares about low-level “sparse reconstruction”,not the high-level task!

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Sparse Coding – Quest for Dictionary

Signals are normally mixtures of diverse phenomena; how canwe wisely choose D to perform well on the given signals?A data driven solution: train adaptive dictionaries from the givensignal instances for sparse representations.Given training data xiN

i=1, the dictionary learning problem, in itsmost popular form, can be formulated as

minD,αiN

i=1

N∑i=1

‖xi − Dαi‖22 + λ‖αi‖1, s.t. ‖D(:, j)‖2 ≤ 1,

where D ∈ Rd×K (d < K ) is an over-complete dictionary.Problem: it only cares about low-level “sparse reconstruction”,not the high-level task!

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Bilevel Sparse Coding – Quest for Dictionary

Many vision and learning tasks can be formulated based onsparse representations

Image feature learningImage super-resolutionCompressive sensingImage classification, etc

We relate the low-level dictionary learning with the high-level tasknaturally with a bilevel formulation.Goal: learn more meaningful sparse representation for the giventask.Advantage: the training procedure is totally consistent with thetesting objective.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Bilevel Sparse Coding – Quest for Dictionary

Many vision and learning tasks can be formulated based onsparse representations

Image feature learningImage super-resolutionCompressive sensingImage classification, etc

We relate the low-level dictionary learning with the high-level tasknaturally with a bilevel formulation.

Goal: learn more meaningful sparse representation for the giventask.Advantage: the training procedure is totally consistent with thetesting objective.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Bilevel Sparse Coding – Quest for Dictionary

Many vision and learning tasks can be formulated based onsparse representations

Image feature learningImage super-resolutionCompressive sensingImage classification, etc

We relate the low-level dictionary learning with the high-level tasknaturally with a bilevel formulation.Goal: learn more meaningful sparse representation for the giventask.

Advantage: the training procedure is totally consistent with thetesting objective.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Bilevel Sparse Coding – Quest for Dictionary

Many vision and learning tasks can be formulated based onsparse representations

Image feature learningImage super-resolutionCompressive sensingImage classification, etc

We relate the low-level dictionary learning with the high-level tasknaturally with a bilevel formulation.Goal: learn more meaningful sparse representation for the giventask.Advantage: the training procedure is totally consistent with thetesting objective.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Bilevel optimization

Mathematical programs with optimization problems in the constraints:

minx∈X,y

F (x , y)

s.t. G(x , y) ≤ 0,y = arg min

yf (x , y),

s.t. g(x , y) ≤ 0.

F and f are the upper-level and lower-level objective functionsrespectively.G and g are the upper-level and lower-level constraintsrespectively.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Bilevel optimization

Simple example: Toll-setting problem on a transportation networkNetwork manager maximizes the revenue raised from tollsNetwork users minimize their travel costs

maxT ,f ,x

∑a∈A

Taxa

s.t. la ≤ Ta ≤ ua,∀a ∈ A

(f , x) ∈ arg minf ′,x ′

∑a∈A

cax′

a +∑a∈A

Tax′

a

s.t. ...

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Bilevel Sparse Coding: Outline

1 Introduction

2 Bilevel Sparse CodingThe learning modelThe learning algorithm

3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing

4 Conclusion

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

The Learning Model

A generic bilevel learning model:

minD,Θ

1N

N∑i=1

L(D, zi ,Θ)

s.t. zi = arg minα‖α‖1, s.t. ‖xi − Dα‖2

2 ≤ ε, ∀i ,

G(Θ) ≤ 0,‖D(:, k)‖2 ≤ 1, ∀k .

L is some smooth cost function defined by the specific task.Θ is the parameter set of a specific model.xiN

i=1 are training samples from the input space X .May involve more than one feature space.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

A Simple Example

Coupled sparse coding: Relate two feature spaces by theircommon sparse representations.

minDx ,Dy

1N

N∑i=1

‖zxi − zy

i ‖22

s.t. zxi = arg min

α‖α‖1, s.t. ‖xi − Dxα‖2

2 ≤ εx , ∀i ,

zyi = arg min

α‖α‖1, s.t. ‖yi − Dyα‖2

2 ≤ εy , ∀i ,

‖Dx (:, k)‖2 ≤ 1, ∀k ,‖Dy (:, k)‖2 ≤ 1, ∀k ,

where xi ,yiNi=1 are randomly sampled from the joint space X × Y.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Bilevel Sparse Coding: Outline

1 Introduction

2 Bilevel Sparse CodingThe learning modelThe learning algorithm

3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing

4 Conclusion

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

A Difficult Problem

Bilevel optimization: mathematical programs with optimizationproblems in the constraints

minD,Θ

1N

N∑i=1

L(D, zi ,Θ)

s.t. zi = arg minα‖α‖1, s.t. ‖xi − Dα‖2

2 ≤ ε, ∀i ,

G(Θ) ≤ 0,‖D(:, k)‖2 ≤ 1, ∀k .

Optimization for D is a bilevel optimization.L is the upper-level objective and `1-norm minimization is thelower-level optimization.Highly nonconvex and highly nonlinear.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Descent Method?

Regard z as an implicit function of D in the lower-level problem,the bilevel program can be viewed solely in terms of theupper-level variable D.

Applying the chain rule, we have, whenever ∇Dz(D) is welldefined

∇DL(D, z(D),Θ) = ∇DL(D, z,Θ) +∇zL(D, z,Θ)∇Dz(D).

Problem: Is the gradient ∇Dz(D) available?

z = arg minα‖α‖1, s.t. ‖x− Dα‖2

2 ≤ ε.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Descent Method?

Regard z as an implicit function of D in the lower-level problem,the bilevel program can be viewed solely in terms of theupper-level variable D.Applying the chain rule, we have, whenever ∇Dz(D) is welldefined

∇DL(D, z(D),Θ) = ∇DL(D, z,Θ) +∇zL(D, z,Θ)∇Dz(D).

Problem: Is the gradient ∇Dz(D) available?

z = arg minα‖α‖1, s.t. ‖x− Dα‖2

2 ≤ ε.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Descent Method?

Regard z as an implicit function of D in the lower-level problem,the bilevel program can be viewed solely in terms of theupper-level variable D.Applying the chain rule, we have, whenever ∇Dz(D) is welldefined

∇DL(D, z(D),Θ) = ∇DL(D, z,Θ) +∇zL(D, z,Θ)∇Dz(D).

Problem: Is the gradient ∇Dz(D) available?

z = arg minα‖α‖1, s.t. ‖x− Dα‖2

2 ≤ ε.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Differentiability

Lasso

The `1-norm minimization problem can be reformulated as the Lassoproblem

z = arg minα‖x− Dα‖2

2 + λ‖α‖1.

Transition Point (Efron et al. 2004)

For a given response vector x, there is a finite sequence of λ’s,λ0 > λ1 > · · · > λK = 0, such that if λ is in the interval of (λm, λm+1),the active set Λ = k : z(k) 6= 0 and sign vector sign(zΛ) areconstant with respect to λ.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Differentiability

TheoremFix any λ > 0, and λ is not a transition point for x, the active set Λ andthe sign vector sign(zΛ) are locally constant with respect to both x andD.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Differentiability

If λ is not a transition point of x, we have the equiangularconditions

a :∂‖x− Dz‖2

2∂z(k)

+ λ sign(z(k)) = 0, for k ∈ Λ,

b :

∣∣∣∣∂‖x− Dz‖22

∂z(k)

∣∣∣∣ < λ, for k 6∈ Λ.

Applying implicit differentiation on the above Eqn. (a), we have

∂zΛ

∂DΛ=(DT

Λ DΛ

)−1(∂DT

Λ x∂DΛ

− ∂DTΛ DΛ

∂DΛzΛ

).

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Differentiability

If λ is not a transition point of x, we have the equiangularconditions

a :∂‖x− Dz‖2

2∂z(k)

+ λ sign(z(k)) = 0, for k ∈ Λ,

b :

∣∣∣∣∂‖x− Dz‖22

∂z(k)

∣∣∣∣ < λ, for k 6∈ Λ.

Applying implicit differentiation on the above Eqn. (a), we have

∂zΛ

∂DΛ=(DT

Λ DΛ

)−1(∂DT

Λ x∂DΛ

− ∂DTΛ DΛ

∂DΛzΛ

).

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Differentiability

Let Ω denotes the nonactive set, we observe thatAs zΛ is only connected with DΛ, a perturbation on DΩ will notchange its value. Therefore, we have

∂zΛ

∂DΩ= 0. (1)

As Λ and sign(zΛ) are constant for a small perturbation of D, zΩ

stays zero, so we have∂zΩ

∂D= 0 (2)

Therefore, the nonzero part of ∇Dz(D) is defined by ∂zΛ/∂DΛ.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Differentiability

Let Ω denotes the nonactive set, we observe thatAs zΛ is only connected with DΛ, a perturbation on DΩ will notchange its value. Therefore, we have

∂zΛ

∂DΩ= 0. (1)

As Λ and sign(zΛ) are constant for a small perturbation of D, zΩ

stays zero, so we have∂zΩ

∂D= 0 (2)

Therefore, the nonzero part of ∇Dz(D) is defined by ∂zΛ/∂DΛ.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Stochastic Gradient Descent

Given ∇Dz(D), ∇DL can be evaluated. Applying stochasticgradient descent, we have

Dn+1 = Dn − rn∂Ln

∂D/‖∂Ln

∂D‖2

rn =r0

(n/N + 1)p ,

where p controls the shrinkage rate the step size.

Project the updated dictionary onto the unit ball.

The complete optimization procedure alternatively optimize over Dand Θ.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

The learning modelThe learning algorithm

Stochastic Gradient Descent

Given ∇Dz(D), ∇DL can be evaluated. Applying stochasticgradient descent, we have

Dn+1 = Dn − rn∂Ln

∂D/‖∂Ln

∂D‖2

rn =r0

(n/N + 1)p ,

where p controls the shrinkage rate the step size.Project the updated dictionary onto the unit ball.

The complete optimization procedure alternatively optimize over Dand Θ.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Bilevel Sparse Coding: Outline

1 Introduction

2 Bilevel Sparse CodingThe learning modelThe learning algorithm

3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing

4 Conclusion

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Single Frame Super-resolution

Problem: Given a single low-resolution input, and a set of pairs(high- and low-resolution) of training patches sampled fromsimilar images, reconstruct a high-resolution version of the input.

ApplicationsPhoto zooming (e.g., Photoshop, Genuine Fractal)Photo printingVideo standard conversion, etc

Difficulty: single-image super-resolution is an extremelyill-posed problem.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Super-resolution via Sparse Recovery

High-resolution patches have sparse representations in terms ofsome over-complete dictionary

x = Dhz0

where x ∈ Rm, Dh ∈ Rm×K , and ‖z0‖0 m

We do not observe the high-resolution patch x, but itslow-resolution version y ∈ Rn

y = Lx = LDhz0 = Dlz0

L is the sampling matrix (blurring and downsampling)y is the n linear measurements of the sparse coefficients z0

Sparse recovery? If we can obtain z0 from y = Dlz(underdetermined linear system), we can recover x as Dhz0.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Super-resolution via Sparse Recovery

High-resolution patches have sparse representations in terms ofsome over-complete dictionary

x = Dhz0

where x ∈ Rm, Dh ∈ Rm×K , and ‖z0‖0 mWe do not observe the high-resolution patch x, but itslow-resolution version y ∈ Rn

y = Lx = LDhz0 = Dlz0

L is the sampling matrix (blurring and downsampling)y is the n linear measurements of the sparse coefficients z0

Sparse recovery? If we can obtain z0 from y = Dlz(underdetermined linear system), we can recover x as Dhz0.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Super-resolution via Sparse Recovery

High-resolution patches have sparse representations in terms ofsome over-complete dictionary

x = Dhz0

where x ∈ Rm, Dh ∈ Rm×K , and ‖z0‖0 mWe do not observe the high-resolution patch x, but itslow-resolution version y ∈ Rn

y = Lx = LDhz0 = Dlz0

L is the sampling matrix (blurring and downsampling)y is the n linear measurements of the sparse coefficients z0

Sparse recovery? If we can obtain z0 from y = Dlz(underdetermined linear system), we can recover x as Dhz0.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Super-resolution via Sparse Recovery

Assume we have the coupled dictionaries Dh and Dl .

Input: low-resolution image Y.

Find sparse solution for each patch yp of Y by

z0 = arg minz‖Dlz− yp‖2

2 + λ‖z‖1.

Recover the corresponding high-resolution image patch asxp = Dhz0.

How to train Dl and Dh for good recovery?

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Super-resolution via Sparse Recovery

Assume we have the coupled dictionaries Dh and Dl .

Input: low-resolution image Y.Find sparse solution for each patch yp of Y by

z0 = arg minz‖Dlz− yp‖2

2 + λ‖z‖1.

Recover the corresponding high-resolution image patch asxp = Dhz0.

How to train Dl and Dh for good recovery?

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Super-resolution via Sparse Recovery

Assume we have the coupled dictionaries Dh and Dl .

Input: low-resolution image Y.Find sparse solution for each patch yp of Y by

z0 = arg minz‖Dlz− yp‖2

2 + λ‖z‖1.

Recover the corresponding high-resolution image patch asxp = Dhz0.

How to train Dl and Dh for good recovery?

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Super-resolution via Sparse Recovery

Assume we have the coupled dictionaries Dh and Dl .

Input: low-resolution image Y.Find sparse solution for each patch yp of Y by

z0 = arg minz‖Dlz− yp‖2

2 + λ‖z‖1.

Recover the corresponding high-resolution image patch asxp = Dhz0.

How to train Dl and Dh for good recovery?

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Joint Dictionary Training – Previous Approach

Our previous solution.Randomly sample high- and low-resolution image patch pairsxi ,yiN

i=1 from the training data.Learn Dh, Dl jointly:

minDh,Dl ,zi

N∑i=1

‖xi − Dhzi‖22 + ‖yi − Dlzi‖2

2 + λ‖zi‖1,

s.t. ‖Dh(:, k)‖2 ≤ 1, ‖Dl(:, k)‖2 ≤ 1

However, ...

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Joint Dictionary Training – Previous Approach

Our previous solution.Randomly sample high- and low-resolution image patch pairsxi ,yiN

i=1 from the training data.Learn Dh, Dl jointly:

minDh,Dl ,zi

N∑i=1

‖xi − Dhzi‖22 + ‖yi − Dlzi‖2

2 + λ‖zi‖1,

s.t. ‖Dh(:, k)‖2 ≤ 1, ‖Dl(:, k)‖2 ≤ 1

However, ...

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Joint Dictionary Training – Problem

In training, we have

minDh,Dl ,zi

N∑i=1

‖xi − Dhzi‖22 + ‖yi − Dlzi‖2

2 + λ‖zi‖1

In testing, we only have the low-resolution patch yi ,

minzi‖xi − Dhzi‖2

2+‖yi − Dlzi‖22 + λ‖zi‖1,

and therefore, good reconstruction of xi is not guaranteed.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Joint Dictionary Training – Problem

In training, we have

minDh,Dl ,zi

N∑i=1

‖xi − Dhzi‖22 + ‖yi − Dlzi‖2

2 + λ‖zi‖1

In testing, we only have the low-resolution patch yi ,

minzi‖xi − Dhzi‖2

2+‖yi − Dlzi‖22 + λ‖zi‖1,

and therefore, good reconstruction of xi is not guaranteed.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Bilevel Formulation

Goal: Learn Dh and Dl , such that the sparse representation z ofy in terms of Dl can well reconstruct x with Dh.

Given high- and low-resolution training patch pairs xi ,yiNi=1, the

learning model is formulated as

minDh,Dl

1N

N∑i=1

‖Dhzi − xi‖22

s.t. zi = arg minα‖α‖1, s.t. ‖yi − Dlα‖2

2 ≤ ε

‖Dl (:, k)‖2 ≤ 1,‖Dh(:, k)‖2 ≤ 1,

The training process is completely consist with testing.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Bilevel Formulation

Goal: Learn Dh and Dl , such that the sparse representation z ofy in terms of Dl can well reconstruct x with Dh.Given high- and low-resolution training patch pairs xi ,yiN

i=1, thelearning model is formulated as

minDh,Dl

1N

N∑i=1

‖Dhzi − xi‖22

s.t. zi = arg minα‖α‖1, s.t. ‖yi − Dlα‖2

2 ≤ ε

‖Dl (:, k)‖2 ≤ 1,‖Dh(:, k)‖2 ≤ 1,

The training process is completely consist with testing.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Bilevel Formulation

Goal: Learn Dh and Dl , such that the sparse representation z ofy in terms of Dl can well reconstruct x with Dh.Given high- and low-resolution training patch pairs xi ,yiN

i=1, thelearning model is formulated as

minDh,Dl

1N

N∑i=1

‖Dhzi − xi‖22

s.t. zi = arg minα‖α‖1, s.t. ‖yi − Dlα‖2

2 ≤ ε

‖Dl (:, k)‖2 ≤ 1,‖Dh(:, k)‖2 ≤ 1,

The training process is completely consist with testing.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Results

Setting: 100,000 high- and low-resolution 5× 5 image patch pairsare sampled for training and 100,000 for testing. Dh and Dl areinitialized from joint dictionary training. The learning algorithmconverges in 5 iterations.

21.61% 19.60% 21.89% 18.91% 20.55%

17.43% 15.75% 17.92% 15.69% 14.70%

17.15% 16.96% 19.95% 17.57% 15.99%

16.41% 17.78% 18.30% 16.80% 15.82%

20.48% 14.68% 15.52% 14.64% 20.51%

1 2 3 4 5

1

2

3

4

5

Pixel-wise MSE reduction compared with joint dictionary training

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

SR Results

Visual comparison: Top: joint dictionary training; bottom: bilevelsparse coding.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Practical Implementation

Learn fast sparse codingapproximations with aneural network.Selective patchprocessing.Takes 5s to upscale animage from 200× 200 to800× 800 on a single core3 GHz with 4G RAM.One of the fastest SRalgorithms.

Input

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Practical Implementation

Learn fast sparse codingapproximations with aneural network.Selective patchprocessing.Takes 5s to upscale animage from 200× 200 to800× 800 on a single core3 GHz with 4G RAM.One of the fastest SRalgorithms.

Bicubic

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Practical Implementation

Learn fast sparse codingapproximations with aneural network.Selective patchprocessing.Takes 6s to upscale animage from 200× 200 to800× 800 on a single core3 GHz with 4G RAM.One of the fastest SRalgorithms.

Ours

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Bilevel Sparse Coding: Outline

1 Introduction

2 Bilevel Sparse CodingThe learning modelThe learning algorithm

3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing

4 Conclusion

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Feature Representation by Pooling Sparse Codes

Fig. The image feature extraction diagram.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Feature Representation by Pooling Sparse Codes

A simple two-layer network.

Coding: VQ, soft assignment, LLC,sparse coding, linear filtering.

Pooling: average, energy, max, log,`p.

Works well on diverse recognitionbenchmarks: object, scene, action,face, digit, gender, expression, ageestimation, and so on.

Key component of the winner systemfor PASCAL09 on image recognition. Image feature extraction diagram

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

The Feature Extraction Algorithm

1 Represent image X as sets of local descriptors in a spatialpyramid

X =[Y0

11,Y111,Y

112, ...,Y

244],

2 Given dictionary D, encode the local descriptors into sparsecodes by

Zsij = arg min

A‖Ys

ij − DA‖22 + λ‖A‖1,

and we obtainS =

[Z0

11, Z111, Z

212, ..., Z

244

]3 Max pooling over each set of sparse codes and concatenate

them

β =2⋃

s=0

2s⋃i,j=1

[βs

ij] , where βs

ij = max(|Zs

ij |).

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Unsupervised Dictionary Learning

Randomly sample a set of local descriptors xiNi=1 from the training

set, use current sparse coding technique to learn a dictionary D thatcan sparsely represent the data.

minD,αiN

i=1

n∑i=1

‖xi − Dαi‖22 + λ‖αi‖1,

s.t. ‖D(:, k)‖2 ≤ 1,

Optimization is performed in an alternating fashion: fix D,optimize αiN

i=1; and fix αiNi=1, and optimize D.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Supervised Dictionary Learning

The unsupervised dictionary learning is good for reconstruction,not necessarily effective for classification.

Training data with image labels (Xi , yi )Ni=1.

Train the dictionary together with the classifier

minD,w

N∑

i=1

`(yi , f (βi ,w)) + γ‖w‖22

,

s.t. βi = pooling(Zi)

Zi = arg minA‖Xi − DA‖2

2 + λ‖A‖1

‖D(:, k)‖2 ≤ 1,∀k ,

where `(·) is a loss function and f (·,w) is the linear predictionmodel.

Optimization for w is training the classifier.Optimization for D is a bilevel program.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Supervised Dictionary Learning

The unsupervised dictionary learning is good for reconstruction,not necessarily effective for classification.Training data with image labels (Xi , yi )N

i=1.

Train the dictionary together with the classifier

minD,w

N∑

i=1

`(yi , f (βi ,w)) + γ‖w‖22

,

s.t. βi = pooling(Zi)

Zi = arg minA‖Xi − DA‖2

2 + λ‖A‖1

‖D(:, k)‖2 ≤ 1,∀k ,

where `(·) is a loss function and f (·,w) is the linear predictionmodel.

Optimization for w is training the classifier.Optimization for D is a bilevel program.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Supervised Dictionary Learning

The unsupervised dictionary learning is good for reconstruction,not necessarily effective for classification.Training data with image labels (Xi , yi )N

i=1.Train the dictionary together with the classifier

minD,w

N∑

i=1

`(yi , f (βi ,w)) + γ‖w‖22

,

s.t. βi = pooling(Zi)

Zi = arg minA‖Xi − DA‖2

2 + λ‖A‖1

‖D(:, k)‖2 ≤ 1,∀k ,

where `(·) is a loss function and f (·,w) is the linear predictionmodel.

Optimization for w is training the classifier.Optimization for D is a bilevel program.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Face recognition

CMU Multi-PIE DatabaseThis dataset contains 337 subjectsacross simultaneous variations inpose, expression, and illumination.We use session 1 as training, andthe rest sessions 2-4 for testing.The dataset is challenging due tothe large number of subjects, anddue to natural variations in subjectappearance over time.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Face recognition

Face recognition error (%) on large-scale Multi-PIE.

Rec. Rates Session 2 Session 3 Session 4LDA 50.6 55.7 52.1NN 32.7 33.8 37.2NS 22.4 25.7 26.6SR 8.6 9.7 9.8U-SC 5.4 9.0 7.5S-SC 4.8 6.6 4.9Improvements 11.1% 26.7% 34.7%

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Gender Recognition

FRGC 2.0The dataset contains 568 individuals, totally 14714 face images undervarious lighting conditions and backgrounds. 11700 images from 451randomly chosen individuals serve as the training set, and 3014images from the rest 114 persons are modeled as the testing set.

Classification Error (%)

Algorithms SVM (RBF) CNN U-SC S-SC ImprovementsError Rate 8.6 5.9 6.8 5.3 22.1%

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Hand Written Digit Recognition

MNIST: The dataset consists of 70,000 handwritten digits, of which60,000 are selected for training and the rest 10,000 for testing.

Algorithms Error RateSVM (RBF) 1.41L1 sparse coding 2.02Local coordinate coding 1.90Deep Belief Network 1.20CNN 0.82U-SC 0.98S-SC 0.84Improvements 14.3%

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Bilevel Sparse Coding: Outline

1 Introduction

2 Bilevel Sparse CodingThe learning modelThe learning algorithm

3 Sparse Modeling ApplicationsSingle image super-resolutionSupervised dictionary learningAdaptive compressive sensing

4 Conclusion

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Formulation

Let x be the original signal, Φ be the sampling matrix, andy = Φx be the linear measurements. Compressive sensingrecovery is done by

z = minα‖α‖1, s.t. y = ΦDxα

x =Dxz

Dx is important for the recovery quality.With the training samples xiN

i=1, learn Dx by directly minimizingthe compressive sensing recovery error:

minDx

1N

N∑i=1

‖xi − Dxzi‖22

s.t. yi = Φxi , Dy = ΦDx

zi = arg minα‖α‖1, s.t. ‖yi − Dyα‖2

2 ≤ ε

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Formulation

Let x be the original signal, Φ be the sampling matrix, andy = Φx be the linear measurements. Compressive sensingrecovery is done by

z = minα‖α‖1, s.t. y = ΦDxα

x =Dxz

Dx is important for the recovery quality.

With the training samples xiNi=1, learn Dx by directly minimizing

the compressive sensing recovery error:

minDx

1N

N∑i=1

‖xi − Dxzi‖22

s.t. yi = Φxi , Dy = ΦDx

zi = arg minα‖α‖1, s.t. ‖yi − Dyα‖2

2 ≤ ε

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

Formulation

Let x be the original signal, Φ be the sampling matrix, andy = Φx be the linear measurements. Compressive sensingrecovery is done by

z = minα‖α‖1, s.t. y = ΦDxα

x =Dxz

Dx is important for the recovery quality.With the training samples xiN

i=1, learn Dx by directly minimizingthe compressive sensing recovery error:

minDx

1N

N∑i=1

‖xi − Dxzi‖22

s.t. yi = Φxi , Dy = ΦDx

zi = arg minα‖α‖1, s.t. ‖yi − Dyα‖2

2 ≤ ε

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

CS Results

Settings: 10,000 image patches of 16× 16 are randomly sampledfor training and 5,000 for testing from medical images. Haar Waveletbasis is used as our baseline and initialization. Bernouli randommatrix is used as the sampling matrix.

0 5 10 15 20 250.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6x 10

5

Iteration

Co

st

Objective value vs. iteration number for 10%sample rate.

0.1 0.15 0.2 0.25 0.3

18

20

22

24

26

28

Sampling Rate

PS

NR

Learned

Wavelet

Recovery accuracy comparison on the test imagepatches in PSNR.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

CS Results

Settings: 10,000 image patches of 16× 16 are randomly sampledfor training and 5,000 for testing from medical images. Haar Waveletbasis is used as our baseline and initialization. Bernouli randommatrix is used as the sampling matrix.

0 5 10 15 20 250.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6x 10

5

Iteration

Co

st

Objective value vs. iteration number for 10%sample rate.

0.1 0.15 0.2 0.25 0.3

18

20

22

24

26

28

Sampling Rate

PS

NR

Learned

Wavelet

Recovery accuracy comparison on the test imagepatches in PSNR.

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Single image super-resolutionSupervised dictionary learningAdaptive compressive sensing

CS Results

Image recovery on the “bone” image with 20% measurements

Ground truth Wavelet(22.8 dB) Ours (27.6 dB)

Jianchao Yang Bilevel Sparse Coding

IntroductionBilevel Sparse Coding

Sparse Modeling ApplicationsConclusion

Conclusion

Learning the meaningful representation is critical for manyapplicationsMany sparse coding based applications can be formulated as abilevel programBilevel programs are extremely useful in many hierarchicalmodelsMore applications in computer vision and machine learning?E.g., model selection.

Jianchao Yang Bilevel Sparse Coding