An Alternating Direction Algorithm for Structure …optimization/L1/optseminar/ADM for...An...

An Alternating Direction Algorithm for Structure-enforced

Matrix Factorization

Lijun Xu (Dalian University of Technology)

Supervised by

Bo Yu (DUT) Yin Zhang (Rice University)

March 27, 2013

Outline

Introduction Alternating Direction Method (ADM) ADM Extension to SeMF Numerical experiments Conclusion

• Matrix Factorization • Various factorizations requiring different

constraints on and , a) Exact factorizations: LU, QR, SVD and

eigendecomposition, etc b) Recent approximate factorizations : NMF, K-means,

sparse PCA, matrix completion, dictionary learning, etc.

Introduction

1min , , ,2

m n m k k nFX Y

M XY M X Y× × ×− ∈ ∈ ∈

• In practice, many constraints on and impose structural properties like non-negativity, sparsity, orthogonality, normalization, etc., which allow easy ‘projections’.

• Structure-enforced Matrix Factorization (SeMF)

where and are easily projectable sets.

1min , s.t. , 2 FX Y

M XY X Y− ∈ ∈

Introduction

• Some examples of easily projectable sets : Non-negativity :

Sparsity:

Orthogonality:

, 0( )

0 , 0ij ij

X≥= <

{ : 0}ijX X= ≥

0{ : , 1, 2, }iX X k i= ≤ =

, | | is in the first -th largest absolute values of ( )

0 , otherwiseij ij iX X k X

{ : , }i JX X X i I= ⊥ ∈

( )1( ) , ( )

T TJ J J J i

X X X X X i IX

− Ι − ∈= ∈

Introduction

Normalization:

Combinatorial structure:

E.g. 3 groups, each group is sparse.

, 1( )

, 1i i i

X X XX

>= ≤

{ : 1, 1, 2, }iX X i= ≤ =

{ }1 2 : , 1, 2,

r iI I I I iX X X X X i r = = ∈ =

1 1 2 2( ) ( ) ( ) ( )

r rI I IX X X X =

1 zero 2 zeros 1 zero

Introduction

• Problems with specific structural patterns

a) Sparse NMF : non-negative (+sparse) : non-negative (+ sparse) b) Sparse PCA : sparse : column normalized c) Dictionary Learning for sparse representation : column normalized : sparse etc.

M XY X Y− ∈ ∈

• Classic ADM:

where are convex, are closed convex. • Augmented Lagrangian:

Alternating Direction Method

ADM Extension to SeMF • Original Model:

• Model with splitting variables:

Splitting variables separates from (similarly for ), Separations facilitate alternating direction methods

M XY X Y− ∈ ∈

1min , s.t. 0, 0, ,2 FX Y U V

M XY X U Y V U V− − = − = ∈ ∈

ADM framework to SeMF

• Augmented Lagrangian:

where are lagrangian multipliers, are penalty parameters and product .

Minimizing with respect to one at a time while fixing others, and then updating after each sweep of such alternating minimization.

2 2 21( , , , , , )2 2 2

+ ( ) ( )

A F F FX Y U V M XY X U Y V

X U Y V

α βΛ Π = − + − + −

Λ• − +Π • −

, ij iji jA B a b• =∑

,Λ Π ( , ) 0α β >

A ( ), and , ,X Y U V,Λ Π

ADM framework to SeMF

• Framework:

argmin ( , , , , , ) ,

( / ),

k k k k k kA

Xk k k k k k

k k k k

X X Y U V

Y X Y U V

U XV Y

← Λ Π

← +Λ

← +Π

Λ ← Λ + −

Π ← Π + −

Implementation • Choice of Step length we set Adaptive updating Motivation: fixed values often cause slow convergence and getting

trapped in local minima. Intuition : balance the changes of the 3 terms and .

• Stopping criterion: , where

M XY−

,X U Y V− −

1,γ =( )0,1.618 ,γ ∈, , α β γ

( , ) , α β

1 k k kf f f tol+− ≤ k kk F

f M X Y= −

Implementation • An updating strategy:

Implementation • An simple example:

using different initial :

,: random 40 60 matrix, || || =1: sparse 60 1500 matrix

each column has 3 zeros with random location and value,

A XYX xY

2[1 0.1] 10 , 1, 5.kA k−× × =

1min . . 1, 32 i iFX Y

A XY s t x y− = ≤

Numerical Experiments Dictionary Learning

Synthetic experiments: (compare with K-SVD) X*: random 20*50, columns normalized; Y*: 3 random non-zeros each column; M: X*Y*+ white Gaussian noise.

1min , s.t. 1, ,2 i jFX Y

M XY x y k i j− ≤ ≤ ∀，

: samples of data, : overcomplete dictionary matrix,

: sparse representation of ,

Denote X as learned dictionary. Measure distance: ( )( , ) min 1 ,T

j i jidist x X x x∗ ∗= −

In this case (sparsity = 3), SeMF can recover better when number of samples is small (<500).

Test: a) Solve with different numbers of samples and figure out the percentage of recovery columns ,

Numerical Experiments if is recovered, and define

( , ) 0.01,jdist x X∗ ≤

( , ) ( ( , ))jdist X X mean dist x X∗ ∗=jx∗

Dictionary size : 20*50, Sparsity: 3 Noise: 20dB .

b) The smallest number of samples to reach 95% recovery of dictionary respective to different sparsity ,

the number of samples : [200:50:2000] sparsity: [1 2 3 4 5 6] average results of 10 experiments:

Numerical Experiments

Dictionary size : 20*50, Noise: 20dB .

c) Recovery respect to different noise level.

For each SNR, compute the number of recovered atoms, repeat 100 tests, sort the results and average in groups of 20. SNR = [10 20 30 ]dB

Numerical Experiments Test on Swimmer Datasets

• Swimmer consists of 256 images of size 32*32. Each image is constituted by 5 parts from the 17 distinct non-overlapping basis images, i.e., a centered invariant part called torso and four limbs in one of the 4 positions.

• Goal: extracting non-negative basis images . 1024 256 1024 17 17 256, ,M X Y× × ×∈ ∈ ∈

1 17{ , , }X X

Different structure enforcing 1. Sparse NMF

2. Sparse NMF with equal non-zero coefficients

Latent property: 5 parts of swimmer image have the same

coefficient, which means there are 5 equal non-zeros in the sparse representation Y.

1min , s.t. 5 1, 2562 jFX Y

M XY y j≥ ≥

− ≤ = ，

00 0 ,,

1min , s.t. ( , 5 2

) jFX j nnzY jy meM Y a jyX yn≥ ≥

− ≤= ∀，

Results on different structure enforcing

Sparse NMF Sparse NMF with equal coefficients

Improved but no sequence

3. Sparse NMF with orthogonal property Since sparse NMF can not apparently extract the central

torso, but potential sparsity and orthogonality to 4 limbs. (Actually all 5 parts are independent and there are non-overlapping non-zero parts.)

1, ,16 12

00, 00 7 171min , s.t. , 52

7 , 1 jFX Yx x xM XY y

≥ ≥− ⊥ ≤ ≤

Different structure enforcing Numerical Experiments

Sparse NMF Sparse NMF with orthogonal structure

The torso is classified.

Results on different structure enforcing Numerical Experiments

4. Sparse NMF with combinatorial patterns Divide rows of Y into 5 groups(4 limbs and 1 torso), each

group has only 1 non-zero and the 5 non-zeros are equal.

2,0, 0

1min , s.t. ( 1, 1,)2

,5, ij nnz jF GX Y

M XY y mean y y i≥ ≥

= =− =

G1 G2 G3 G4 G5

Different structure enforcing Numerical Experiments

Sparse NMF enforcing combinatorial patterns

Results on different structure enforcing Numerical Experiments

quite well classified parts

Numerical Experiments Test on Face Images

• Goal: return a part-based representation.

The basis elements extract facial features such as eyes, nose and lips.

• Structure Property: Y is non-negative, X is sparse and non-negative,

Few works with L0 sparse NMF. Non-negative K-SVD (NNK-SVD,2005), Probabilistic sparse matrix factorization

(PSMF,2004), NMFL0 (2012)

a) L1 sparse NMF (relaxation of L0 sparse, convex) penalize or constrain the L1 norm of X or Y: b) L0 sparse NMF (more intuitive, non-convex) constrain the L0 norm of X or Y.

(Hoyer 2004)

• Model: sparsity enforced to matrix X

• Compare to Alg. (R.Peharz, F. Pernkopf, 2012) a) fixed Y, calculate X using non-negative least square

(NNLS), b) update Y maintaining sparse structure of X. (ANLS or Multiplicative Update) Difference in subproblems a) and b): SeMF : minimize augmented lagrangian function, : minimize original objective.

1min , s.t. 2 iFX Y

M XY x K≥ ≥

− ≤

0 -NMF X

• Apply to ORL datasets(10304400, 25 basis parts)

nnz: 33% nnz: 25% nnz: 10%

NMFL0:

• Comparison of reconstruction quality and running time.

similar quality but more faster than in less

sparsity cases (more non-zeros).

0 -NMF X

note: perform better than Hoyer’s method in both SNR and time in the paper “Sparse nonnegative matrix factorization with L0-constraints” by R. Peharz and F. Pernkopf.

0 -NMF X

• SeMF can handle many different structures provided they have easy projections,

• ADM approach for augmented lagrangian of a split model, • Dynamically updating penalty parameters empirically

performs well. • Potential applications to many problems with latent

structure properties to improve solution quality, • Further work on experiments and comparisons, non-convex

complication, parameter choices, etc.

Conclusions

Thank you!

An Alternating Direction Algorithm for Structure …optimization/L1/optseminar/ADM for...An...

Documents

PHYSICS - CLUTCH CH 29: ALTERNATING CURRENTlightcat-files.s3.amazonaws.com/...3...ch-29-alternating-current-10709.pdf · PRACTICE: ALTERNATING CURRENT An AC source produces an alternating

An Algorithm for Probabilistic Alternating Simulation

2pt Frank-Wolfe Algorithm & Alternating Direction Method ... · Frank-Wolfe Algorithm & Alternating Direction Method of Multipliers Ives Mac^edo ijamj@cs.ubc.ca October 27, 2015

Multimaterial topology optimization · alternating active-phase algorithm for multimaterial topology optimization problems a 115-line matlab implementation r. tavakoli and s.m. mohseni

Alternating Current.pdf

The AIMMS Outer Approximation Algorithm for MINLP · The algorithm solves an alternating sequence of nonlinear (NLP) models and mixed-integer linear (MIP) models. The fi rst version

From the Quantum Approximate Optimization …From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz StuartHadﬁeld 1,23 ∗,ZhihuiWang ,BryanO’Gorman1,4

Alternating Current.docx

Alternating Dual Updates Algorithm for X-ray CT ...fessler/papers/files/jour/15/web/mcgaffin-15-adu.pdfAlternating Dual Updates Algorithm for X-ray CT Reconstruction on the GPU Madison

ALTERNATING - americanradiohistory.comamericanradiohistory.com/.../RCA-10-10-Alternating-Current.pdf · alternating current generator, ... The resulting induced current has moved

Chapter Seven ALTERNATING CURRENT - StudiesToday.com Class 12 Physics...Such a voltage is called alternating voltage ... alternating current voltage and alternating current current

Alternating Current )

Alternating Minimization and Alternating Projection Algorithms: A

ALTERNATING CURRENTS

Alternating Current

Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

Approximate+Message+Passing++ …schniter/pdf/spars11_slides.pdfExample Next few slides Algorithm based on alternating minimization Nesterov acceleration, ... [Boutros and Caire 2002;

A Fast Algorithm for Permutation Pattern Matching Based on ...€¦ · A Fast Algorithm for Permutation Pattern Matching Based on Alternating Runs Marie-Louise Bruner and Martin Lackner

ALTERNATING CURRENT · 2018-02-09 · ALTERNATING CURRENT 10.1 ALTERNATING QUANTmES As mentioned earlier, an alternating quantity is one which reverses its direction periodically,

Pattern Alternating Maximization Algorithm for Missing