18
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 Sparse’09 8 April 2009 Sparse Coding and Automatic Relevance Determination for Multi- way models Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark Joint work with Lars Kai Hansen DTU Informatics Intelligent Signal Processing Technical University of Denmark

Sparse Coding and Automatic Relevance Determination for Multi-way models

  • Upload
    hayden

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Sparse Coding and Automatic Relevance Determination for Multi-way models. Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark. Joint work with Lars Kai Hansen DTU Informatics Intelligent Signal Processing Technical University of Denmark. - PowerPoint PPT Presentation

Citation preview

Page 1: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

1Sparse’09 8 April 2009

Sparse Coding and Automatic Relevance Determination for

Multi-way models

Morten Mørup DTU Informatics

Intelligent Signal ProcessingTechnical University of Denmark

Joint work with Lars Kai HansenDTU Informatics

Intelligent Signal ProcessingTechnical University of Denmark

Page 2: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

2Sparse’09 8 April 2009

Multi-way modeling has become an important tool for Unsupervised Learning and are in frequent use today in a variety of fields including Psychometrics (Subject x Task x Time) Chemometrics (Sample x Emission x Absorption) NeuroImaging(Channel x Time x Trial) Textmining (Term x Document x Time/User) Signal Processing (ICA, i.e. diagonalization of Cummulants)

Vector: 1-way array/ 1st order tensor,

Matrix: 2-way array/ 2nd order tensor,

3-way array/3rd order tensor

Page 3: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

3Sparse’09 8 April 2009

Some Tensor algebra

Matricizing X X(n)

N-mode multiplication

i.e. for matrix multiplication CA=A x1C , ABT=A x2B

I x JK Matrix

K x IJ Matrix

J x KI Matrix

XI x J x K

X(1)

X(2)

X(3)

Page 4: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

4Sparse’09 8 April 2009

Tensor DecompositionThe two most commonly used tensor decomposition models are the CandeComp/PARAFAC (CP) model and the Tucker model

Page 5: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

5Sparse’09 8 April 2009

CP-model unique Tucker model not unique What constitutes an adequate number of

components, i.e. determining J for CP and J1,J2,…,JN for Tucker

is an open problem, particularly difficult for the Tucker model as the number of components are specified for each modality separately

Page 6: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

6Sparse’09 8 April 2009

Agenda To use sparse coding to Simplify the Tucker core forming

a unique representation as well as enable interpolation between the Tucker (full core) and CP (diagonal core) model

To use sparse coding to turn off excess components in the CP and Tucker model and thereby select the model order.

To tune the pruning strength from data by Automatic Relevance Determination (ARD).

Page 7: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

7Sparse’09 8 April 2009

Sparse Coding

Bruno A. Olshausen David J. Field

Nature, 1996

Preserve Information Preserve Sparsity (Simplicity)

Tradeoff parameter

Page 8: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

8Sparse’09 8 April 2009

Sparse Coding (cont.)

Patch image Vectorize patches

Image patch 1 to N

A cor

resp

onds t

o Gab

or

like

feat

ures r

esem

bling

simple

cell b

ehav

ior i

n

V1 of t

he bra

in!

Sparse coding attempts to both learn dictionary and encoding at the same time(This problem is not convex!)

Page 9: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

9Sparse’09 8 April 2009

Automatic Relevance Determination (ARD)

Automatic Relevance Determination (ARD) is a hierarchical Bayesian approach widely used for model selection

In ARD hyper-parameters explicitly represents the relevance of different features by defining the range of variation. (i.e., Range of variation0 Feature removed)

Page 10: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

10Sparse’09 8 April 2009

A Bayesian formulation of the Lasso / BPD problem

Page 11: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

11Sparse’09 8 April 2009

ARD in reality a L0-norm optimization scheme. As such ARD based on Laplace prior corresponds to L0-norm optimization by re-weighted L1-norm

L0 norm by re-weighted L2 follows by imposing Gaussian prior instead of Laplace

Page 12: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

12Sparse’09 8 April 2009

Sparse Tucker decomposition by ARD

Brakes into standard Lasso/BPD sub-problems of the form

Page 13: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

13Sparse’09 8 April 2009

Solving efficiently the Lasso/BPD sub-problems

Notice, in general the alternating sub-problems for Tucker estimation have J<I!

Page 14: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

14Sparse’09 8 April 2009

The Tucker ARD Algorithm

Page 15: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

15Sparse’09 8 April 2009

Results

Tucker(10,10,10) models were fitted to the data, given are below the extracted cores

Fluorescene data: emission x excitation x samples

Synthetic Tucker(3,4,5) Tucker(3,6,4) Tucker(3,3,3) Tucker(?,4,4) Tucker(4,4,4)

Page 16: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

16Sparse’09 8 April 2009

Sparse Coding in fact 2-way case of Tucker model The presented ARD framework extracts a 20

component model when trained on the natural image data of (Olshausen and Field, Nature 1996)

Patch image Vectorize patches

Image patch 1 to N

Page 17: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

17Sparse’09 8 April 2009

Conclusion Imposing sparseness on the core

enable to interpolate between Tucker and CP model

ARD based on MAP estimation form a simple framework to tune the pruning in sparse coding models giving the model order.

ARD framework especially useful for the Tucker model where the order is specified for each mode separately which makes exhaustive evaluation of all potential models expensive.

ARD-estimation based on MAP is closely related to L0 norm estimation based on reweighted L1 and L2 norm. Thus, ARD form a principled framework for learning the sparsity pattern in CS.

Page 18: Sparse Coding and Automatic Relevance Determination for Multi-way models

Informatics and Mathematical Modelling / Intelligent Signal Processing

18Sparse’09 8 April 2009

http://mmds.imm.dtu.dk