28
On the sparse estimation ATR Computational Neuroscience Laboratories Masa-aki Sato

On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Embed Size (px)

Citation preview

Page 1: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

On the sparse estimation

ATR Computational Neuroscience Laboratories

Masa-aki Sato

Page 2: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Contents

1. Why sparse estimation is necessary?Generalization ability for ill posed problem

2. Why sparse estimation can be achieved?Role of Bayesian estimation

3. Example of sparse estimation for real problem

Page 3: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Ill-posed problem

Large number of parameters are estimated from small number of data

• Maximum Likelihood (Minimum Squared Error)– Estimate optimal parameter which maximize likelihood– Overfitting:

Complex models with many adjustable parameters tend to fit noise in training data and degrade generalization ability

         

• Sparse estimation  Extract relevant features and discard irrelevant features

to attain good generalization ability

Page 4: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Function approximation by polynomial

50 data points are randomly generated from quadratic function

22y x x noise= − +

20 1 2

21, , , ,

NN

N

y w w x w x w x

W X

X x x x

= + + + +

= ⋅

⎡ ⎤= ⎢ ⎥⎣ ⎦

… Input (feature) variable

Linear parameter model

0 1 2, , , , NW w w w w⎡ ⎤= ⎢ ⎥⎣ ⎦… Weight parameter vector

Page 5: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Maximum likelihood (Quadratic function)

Find optimal W( ) ( )( )21

T

terror y t W X t

== − ⋅∑

Minimize Squared Error

Page 6: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Maximum likelihood (20 degree polynomial)

Overfitting : Optimal W fits noise in training dataGeneralization is not good

Page 7: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Regularization method (20 degree polynomial)

( ) ( )( )2 2

1

T

terror y t W X t Wα

== − ⋅ + ⋅∑ Minimization

( ) ( )21exp2

P W Wα∝ − ⋅Prior :

Page 8: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Model selection• Search best model which gives best generalization error

by changing number pf parameters (polynomial degree)• Combinatorial serach is almost impossible

for large degree of freedom

Training error & Generalization error

Number pf parameters (Polynomial degree)

Page 9: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Sparse estimation by Bayesian method• Parameters are considered as random variable• Posterior probability is calculated for possible parameter value• Estimation is done by integrated over possible value

according to posterior probability distribution

Page 10: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Sparse estimation (20 degree polynomial)• Precision parameter is introduced

for each weight component and estimated from observed datanwnα

Posterior for 2nd order weight

Posterior for 3rd order weight

( ) ( )21exp ,2n n n nP w wα α∝ − ⋅ = 不定Prior

Page 11: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Prunes irrelevant features in the modeland increase generalization ability

Sparse Estimation

Page 12: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

( )( ) ( )

( )0|

|P P

PP

=X

XXθ θ

θ

( )|P Xθ Regular Model

MAP / ML Estimation(Maximum a Posteriori / Maximum Likelihood)

• Posterior

Liklihood prior

Marginal likelihood

( ) ( )( )MAP 0arg max log |P P= Xθ θ θ• Estimate optimal parameter

1θ2θ

Page 13: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Full Bayesian Estimation

• Estimate posterior parameter distributionand integrate over parameters according to the posterior.

• Posterior

Liklihood prior

Marginal likelihood

( )( ) ( )

( )0|

|P P

PP

=X

XXθ θ

θ

( ) ( ) ( )0|P d P P= ∫X Xθ θ θ

( )|d P= ∫ Xθ θ θ θ

Marginal likelihood

Estimated parameter

Page 14: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Model reductionby Bayesian method

Mixture of Gaussian example

Page 15: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Redundant Model (ill-posed problem)Estimation model is a Mixture of two Gaussian units

Assume data is generated by a single Gaussian

Single unit model correspond to three cases :

( ) ( ) ( ) ( )0 1 0 2| | 1 |P g N g Nθ θ= ⋅ + − ⋅x x xθ

( ) ( )( ) ( )( ) ( )

1 0 2

2 0 1

1 1 2 0

| | , 1 , arbitrary

| | , 0 , arbitrary

| | , , arbitrary

P N g

P N g

P N g

θ θ

θ θ

θ θ θ

= = =

= = =

= = =

x x

x x

x x

θ

θ

θ

Page 16: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Posterior distribution for redundant model

( )|P Xθ

1θ2θ2θ 1θ

( )|P Xθ

Fisher Information matrix becomes singular

0 1g = 0 0g =

Page 17: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Pruning of redundant parameters

1θ2θ

• Complex models explain a given data better than simpler models and give higher posterior value

• Then all parameters are used for prediction

( )|P Xθ

MAP

Full Bayesian • Reduced simpler model dominates by integration over parameters

MAP optimal Reduced modelReduced model

Posterior distribution for 100 sample data generated by a single Gaussian model

0 0g =0 1g =

Page 18: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Parameter pruningin Sparse Estimation

Page 19: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Prior in Sparse Estimation Model

( ) ( )( )( )

21| exp2

log .

n n n n n

n

P w w

P const

α α α

α

∝ ⋅ − ⋅

= (Non-informative prior)

controles a precision (width) ofweight parameter distribution

Arbitrary

Prior

0 ,n nw α= =

Page 20: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Posterior parameter distribution for

Posterior for relevant parameter

Posterior for irrelevant parameter

Other parameters are integrated outAnd their effects are taken into account

( ),n nw α

nwnα

= 0= arbitrary

Page 21: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Calculation of posterior distribution

Free energy maximization Posterior calculation

Distance between trial posterior Q(J,α) and true posterior P(J,α|B)

( )[ ] ( ) ( ) ( )[ ]BαJ,αJ,BαJ, PQKLPQF −= ln

Log marginal likelihood (Evidence)

( ) ( ) ( ) ( )( )B

ααJJBBαJ

PPPP

P 00, =• Posterior

Liklihood (Hierarhical) prior

Marginal likelihood( )∫= αJBαJJJ ddP ,

Page 22: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

( ) ( ) ( ),Q Q Q= JJ J αα αFactorization assumption :

Repeated until convergence

Posterior distribution

Log marginal likelihood

Variational Bayesian (VB) method

Maximization of F(Q)Maximization of F(Q) w.r.t.

Maximization of F(Q) w.r.t. QJ(J)

( ) ( )| at the maximumP Q≈ JJ B J

( )( ) ( )log maximized P F Q≈B

( )Qα α

Page 23: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

( )11log

2 VB VBF Tr −⎡ ⎤′= − ⋅ ⋅ +⎢ ⎥⎣ ⎦B BΣ Σ

0 0 0

VB

σ

σ

′′⋅ = ⋅ ⋅ ⋅ +′

′ ′= ⋅ ⋅ ⋅ ⋅ +

B B G J J G I

G W W G IΣ α

MEG covariance matrix

Estimated MEG covariance

Free energyFree energy after integration of current distribution

Optimal condition (Free energy maximum)

Estimation gain

( )( )

VBn

′⋅=J JG

Page 24: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Variational Bayesian methodPosterior calculation is converted to free energy maximization

Free energy = (Likelihood)+ (Model complexity)

( )

( ) ( ) ( )

2

12 2

1212

L Y W X W W

Y Y X X X

α

α −

⎡ ⎤′= − − ⋅ + ⋅ ⋅⎢ ⎥⎣ ⎦

⎡ ⎤′= − − ⋅ ⋅ ⋅ +⎢ ⎥⎣ ⎦

for finite ,T ≫ 1α

Decreasing function of α

α

Likelihood

Model complexity

Error

Increasing function of

BIC( )

11 log 121 log2

H X X

N T

α−′= − ⋅ ⋅ +

→ − ⋅

Page 25: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

11 log 12

0

H X X

as

α

α

−′= − ⋅ ⋅ +

→ → ∞

One dimensional case

Genaral dimension

( )

11 log 121 log2 eff

H X X

N T

α−′= − ⋅ ⋅ +

→ − ⋅

Number of finite ,T ≫ 1αeffN

Page 26: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Example of sparse estimationfor real problem

I Nambu, R Osu, M Sato, S Ando, M Kawato, E NaitoSingle-trial reconstruction of !nger-pinch forces from human motor-cortical activation measured by near-

infrared spectroscopy (NIRS)NeuroImage 47 (2009) 628.637

Page 27: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Estimated weight by brute force model search

Estimated weight by sparse estimation

forc

e

time

Sparse estimation

(Nambu et al)

Estimate pinching force from 24ch x 21 (sec) NIRS data

Forc

e

Page 28: On the sparse estimation - CNS 1. Why sparse estimation is necessary? Generalization ability for ill posed problem 2. Why sparse estimation can be achieved? Role of Bayesian estimation

Estimated force from NIRS data