1
A Projection free method for Generalized Eigenvalue Problem with a nonsmooth Regularizer Seong Jae Hwang Maxwell D. Collins Sathya N. Ravi Vamsi K. Ithapu Nagesh Adluru Sterling C. Johnson Vikas Singh University of Wisconsin - Madison O BJECTIVE Multi-view statistical analysis of brain imaging data. M OTIVATION :M ULTI - VIEW DATA I Common to see datasets with multiple views (e.g., photos with tags). I In many cases, there is additional side or even partially available information which may be expensive. I Need a general purpose tool for multi-view datasets where some views are partial. H IGH L EVEL S ETUP I Use Regularized Generalized Eigenvalue Problem (R-GEP) as a black-box to incorporate the following information: 1. Primary view (e.g., 3D fractional anisotropy) of the dataset of interest (3D Diffusion Tensor Imaging). 2. Secondary view (e.g., brain connectivity network) of the same dataset of interest. 3. Supplementary data (e.g., cerebrospinal fluid) partially available from more expensive sources such as genotype. (i, j ) indicates the connectivity strength of i th and j th brain regions Figure : DTI image (top left) showing tensor directionality, followed by the FA image (top right) and the connectivity matrix (bottom). P RELIMINARY:O PTIMIZATION ON M ANIFOLDS I Spectral analysis of the similarity/kernel matrix M by via eigendecomposition. I Ordinary eigenvalue problem formulation where tr(·) denotes the trace functional: min V R N ×p tr(V T MV ) s.t. V T V = I (1) I Finds p eigenvectors V R N ×p corresponding to the first p smallest eigenvalues of M . I V T V = I imposes an implicit optimization over the Grassmann manifold. G ENERALIZED E IGENVALUE P ROBLEM (GEP) I The Generalized Eigenvalue Problem (GEP) involves a matrix pencil {M , D } where D is often referred to as a mass matrix : min V R N ×p f (V ) := tr(V T MV ) s.t. V T DV = I (2) I Finds p eigenvectors V R N ×p corresponding to the first p smallest eigenvalues of M with respect to D . I D is derived from a secondary view as a generalized Stiefel manifold constraint. R EGULARIZED G ENERALIZED E IGENVALUE P ROBLEM (R-GEP) I n 0 : a subset of the original subjects with supplementary information. I S : similarity matrix of the subjects in n 0 . I α: leading eigenvector S , or the ”weights” on the subjects in n 0 . I V n 0 ,1 : first eigenvector of M restricted to the set n 0 . I R-GEP formulation using 1 norm: min V R N ×p tr(V T MV )+ λ||V n 0 ,1 - α|| 1 s.t. V T DV = I (3) A LGORITHM :S TOCHASTIC B LOCK C OORDINATE D ESCENT Algorithm 1 Stochastic block coordinate descent on GF N ,p Require: f : GF N ,p R, D R N ×N , V 0 GF N ,p (D ) 1: for t = 1, ..., T do 2: Select rows I⊆{1, ... , N } 3: U 0 V - D -1 II D ¯ II V ¯ 4: [Q QR ] U 0 for Q nonsingular 5: Take G V f (V ) 6: G 0 G IJ + G I ¯ J R T 7: Construct a descent curve Y on GF i ,p (D II ) through U 0 in the direction of -G 0 8: Pick step size τ t satisfying Armijo-Wolfe condition 9: V t +1 Y (τ t ) 10: end for S UBPROBLEM F EASIBILITY AND S INGULARITY C ORRECTION I I , ¯ I : a subset of selected i and non-selected (fixed) row indices of V respectively. I Given any orthogonal U R i ×r , a new iterate satisfying the subproblem constraint is V = D - 1 2 II UP 1 2 - D -1 II D T ¯ II V ¯ . (4) I Assume w.l.o.g., for a i × r nonsingular matrix Q and a r × (p - r ) matrix R , V + D -1 II D T ¯ II V ¯ = Q QR . (5) I For J column indices of Q , and U such that U T D II U = P JJ , the following are feasible: V IJ = U - D -1 II D T ¯ II V ¯ IJ , V I ¯ J = UR - D -1 II D T ¯ II V ¯ I ¯ J . (6) D ESCENT C URVE C ONSTRUCTION I Differentiate f V (U ) w.r.t. U , where V is related to U by (4): U f V (U )= 2(M II V + M I ¯ I V ¯ )+ λg 0 (V (U )). (7) I Pick a subgradient G V f V (U ) and then perform the singularity correction on G: G 0 = G IJ + G I ¯ J R T . (8) I For A = G 0 U T 0 - U 0 G 0T , curve Y as a function of τ by the Crank-Nicolson-like design is Y (τ )= I + τ 2 AD II -1 I - τ 2 AD II U 0 . (9) M Y (τ ) V -∇f (V ) P Ω (V - γ t f (V )) E XPERIMENTS :B RAIN I MAGING DATA I Wisconsin Alzheimer’s Disease Research Center (ADRC) Dataset 102 middle-aged and older adults (58 healthy, 44 diseased). Primary view: 3D volumetric Fractional Anisotropy (FA) imaging data summarizing (possibly losing information) the degree of diffusion of water in each voxel. Secondary view: anatomical connectivity of regions derived from 3D Diffusion Tensor Images (DTI) using fiber counting procedures. Supplementary view: 7 cerebrospinal fluid (CSF) scores measuring disease related proteins. Informative but partial (60 of 102 subjects) and cannot infer brain regions. E XPERIMENTS :S PECTRAL A NALYSIS ON B RAIN I MAGING DATA I R-GEP Setup M : similarity matrix from FA in subject space. D : CC T where C N ×r is the subject space factor matrix of rank r of connectivity tensor. α: leading eigenvector of the covariance matrix of CSF in subject space. I Comparison of classification/regression powers against three methods 1. Baseline model with primary view (FA) only. 2. Intermediate models with PCA on FA and a GEP (2) without a regularizer. 3. PCA-avg model where the primary and secondary source kernels are averaged. I Experimental Result Introducing additional sources of information always increases the performance (63.4% accuracy for baseline to > 91% for R-GEP). Same is the case with regression results in Table 1 (0.68 correlation coefficient from baseline to > 0.78 for R-GEP). I Analysis Based on the Incorporating secondary and incomplete priors increase performance, and our R-GEP model combines these information sources in a meaningful way offering good improvements. Figure : Feature sensitivity. First column shows the FA image. Second column shows overlays of the weights assigned by baseline linear kernel on this FA image. Last column shows overlays from the base R-GEP case in Table. Green (Red) corresponds to smaller (larger) weights. p Baseline PCA PCA-avg GEP (2) R-GEP (3) r = 1 5 10 r = 1 5 10 3 63.41 85.71 85.60 86.62 85.71 85.71 90.34 88.53 86.53 5 63.41 82.60 81.69 85.41 85.41 86.32 89.55 89.34 87.34 7 63.41 84.30 80.69 84.30 84.30 84.30 86.43 88.43 87.53 10 63.41 82.49 83.49 82.39 84.21 86.23 86.53 86.43 89.32 13 63.41 83.32 85.62 86.14 84.23 88.14 89.23 90.05 88.23 p Baseline PCA PCA-avg GEP (2) R-GEP (3) r = 1 5 10 r = 1 5 10 3 0.679 0.718 0.647 0.719 0.718 0.718 0.745 0.771 0.758 5 0.679 0.719 0.614 0.726 0.737 0.735 0.769 0.746 0.749 7 0.679 0.707 0.610 0.707 0.707 0.713 0.763 0.785 0.734 10 0.679 0.656 0.622 0.656 0.742 0.719 0.741 0.762 0.754 13 0.679 0.717 0.654 0.730 0.765 0.745 0.737 0.757 0.754 Table : Healthy vs. diseased classification accuracy (top) and regression correlation coefficient (bottom) (10-fold cross validated) using GEP and R-GEP, compared to the baseline linear classifier and the PCA setup. p is the PCA rank and r is tensor rank. ACKNOWLEDGMENT SJH was supported by a University of Wisconsin CIBM fellowship (5T15LM007359-14). We acknowledge support from NIH grants AG040396 and AG021155, NSF RI 1116584 and NSF CAREER award 1252725, as well as UW ADRC (AG033514), UW ICTR (1UL1RR025011), Waisman Core grant (P30 HD003352-45), UW CPCP (AI117924) and NIH grant AG027161. International Conference on Computer Vision (ICCV) 2015

Seong Jae Hwang Maxwell D. Collins Sathya N. Ravi Vamsi K ...pitt.edu/~sjh95/publication/iccv2015_poster.pdf · Seong Jae Hwang Maxwell D. Collins Sathya N. Ravi Vamsi K. Ithapu Nagesh

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Seong Jae Hwang Maxwell D. Collins Sathya N. Ravi Vamsi K ...pitt.edu/~sjh95/publication/iccv2015_poster.pdf · Seong Jae Hwang Maxwell D. Collins Sathya N. Ravi Vamsi K. Ithapu Nagesh

A Projection free method for Generalized Eigenvalue Problem with a nonsmooth RegularizerSeong Jae Hwang Maxwell D. Collins Sathya N. Ravi Vamsi K. Ithapu Nagesh Adluru Sterling C. Johnson Vikas Singh

University of Wisconsin - Madison

OBJECTIVE

Multi-view statistical analysis of brain imaging data.

MOTIVATION: MULTI-VIEW DATA

I Common to see datasets with multiple views (e.g., photos with tags).I In many cases, there is additional side or even partially available

information which may be expensive.I Need a general purpose tool for multi-view datasets where some

views are partial.

HIGH LEVEL SETUP

I Use Regularized Generalized Eigenvalue Problem (R-GEP) as a black-box toincorporate the following information:1. Primary view (e.g., 3D fractional anisotropy) of the dataset of interest (3D DiffusionTensor Imaging).2. Secondary view (e.g., brain connectivity network) of the same dataset of interest.3. Supplementary data (e.g., cerebrospinal fluid) partially available from moreexpensive sources such as genotype.

(i, j) indicates theconnectivitystrength of ith andjth brain regions

Figure : DTI image (top left) showing tensor directionality, followed by the FA image (top right) and theconnectivity matrix (bottom).

PRELIMINARY: OPTIMIZATION ON MANIFOLDS

I Spectral analysis of the similarity/kernel matrix M by via eigendecomposition.I Ordinary eigenvalue problem formulation where tr(·) denotes the trace functional:

minV∈RN×p

tr(V TMV ) s.t. V TV = I (1)

I Finds p eigenvectors V ∈ RN×p corresponding to the first p smallest eigenvalues of M.I V TV = I imposes an implicit optimization over the Grassmann manifold.

GENERALIZED EIGENVALUE PROBLEM (GEP)I The Generalized Eigenvalue Problem (GEP) involves a matrix pencil M,D where D

is often referred to as a mass matrix :min

V∈RN×pf (V ) := tr(V TMV ) s.t. V TDV = I (2)

I Finds p eigenvectors V ∈ RN×p corresponding to the first p smallest eigenvalues of Mwith respect to D.

I D is derived from a secondary view as a generalized Stiefel manifold constraint.

REGULARIZED GENERALIZED EIGENVALUE PROBLEM (R-GEP)I n′: a subset of the original subjects with supplementary information.I S: similarity matrix of the subjects in n′.I α: leading eigenvector S, or the ”weights” on the subjects in n′.I Vn′,1: first eigenvector of M restricted to the set n′.I R-GEP formulation using `1 norm:

minV∈RN×p

tr(V TMV ) + λ||Vn′,1 − α||1 s.t. V TDV = I (3)

ALGORITHM: STOCHASTIC BLOCK COORDINATE DESCENT

Algorithm 1 Stochastic block coordinate descent on GFN,p

Require: f : GFN,p → R, D ∈ RN×N, V0 ∈GFN,p(D)1: for t = 1, ..., T do2: Select rows I ⊆ 1, ... ,N3: U0← VI· − D−1

IIDIIVI·4: [Q QR]← U0 for Q nonsingular5: Take G ∈ ∂VI·f (V )6: G′← GIJ + GIJRT

7: Construct a descent curve Y on GFi ,p(DII) through U0 in the direction of −G′

8: Pick step size τt satisfying Armijo-Wolfe condition9: Vt+1← Y (τt)

10: end for

SUBPROBLEM FEASIBILITY AND SINGULARITY CORRECTIONI I, I: a subset of selected i and non-selected (fixed) row indices of V respectively.I Given any orthogonal U ∈ Ri×r , a new iterate satisfying the subproblem constraint is

VI· = D−12IIUP

12 − D−1

IIDTIIVI·. (4)

I Assume w.l.o.g., for a i × r nonsingular matrix Q and a r × (p − r ) matrix R,VI· + D−1

IIDTIIVI· =

[Q QR

]. (5)

I For J column indices of Q, and U such that UTDIIU = PJJ , the following are feasible:VIJ = U − D−1

IIDTIIVIJ , VIJ = UR − D−1

IIDTIIVIJ . (6)

DESCENT CURVE CONSTRUCTIONI Differentiate f V (U) w.r.t. U, where V is related to U by (4):

∂Uf V (U) = 2(MIIVI· + MIIVI·) + λg′(V (U)). (7)

I Pick a subgradient G ∈ ∂VI·f V (U) and then perform the singularity correction on G:G′ = GIJ + GIJRT . (8)

I For A = G′UT0 − U0G′T , curve Y as a function of τ by the Crank-Nicolson-like design is

Y (τ ) =(

I +τ

2ADII

)−1 (I − τ

2ADII

)U0. (9)

M

Y (τ )

V

−∇f (V )

PΩ(V − γt∇f (V ))

EXPERIMENTS: BRAIN IMAGING DATAI Wisconsin Alzheimer’s Disease Research Center (ADRC) Dataset• 102 middle-aged and older adults (58 healthy, 44 diseased).• Primary view: 3D volumetric Fractional Anisotropy (FA) imaging data summarizing(possibly losing information) the degree of diffusion of water in each voxel.• Secondary view: anatomical connectivity of regions derived from 3D DiffusionTensor Images (DTI) using fiber counting procedures.• Supplementary view: 7 cerebrospinal fluid (CSF) scores measuring disease relatedproteins. Informative but partial (60 of 102 subjects) and cannot infer brain regions.

EXPERIMENTS: SPECTRAL ANALYSIS ON BRAIN IMAGING DATA

I R-GEP Setup• M: similarity matrix from FA in subject space.• D: CCT where CN×r is the subject space factor matrix of rank r of connectivity tensor.• α: leading eigenvector of the covariance matrix of CSF in subject space.

I Comparison of classification/regression powers against three methods1. Baseline model with primary view (FA) only.2. Intermediate models with PCA on FA and a GEP (2) without a regularizer.3. PCA-avg model where the primary and secondary source kernels are averaged.

I Experimental ResultIntroducing additional sources of information always increases the performance (63.4%accuracy for baseline to > 91% for R-GEP). Same is the case with regression results inTable 1 (0.68 correlation coefficient from baseline to > 0.78 for R-GEP).

I AnalysisBased on the Incorporating secondary and incomplete priors increase performance,and our R-GEP model combines these information sources in a meaningful wayoffering good improvements.

Figure : Feature sensitivity. First column shows the FA image. Second column shows overlays of the weightsassigned by baseline linear kernel on this FA image. Last column shows overlays from the base R-GEP casein Table. Green (Red) corresponds to smaller (larger) weights.

p Baseline PCA PCA-avgGEP (2) R-GEP (3)

r = 1 5 10 r = 1 5 10

3 63.41 85.71 85.60 86.62 85.71 85.71 90.34 88.53 86.535 63.41 82.60 81.69 85.41 85.41 86.32 89.55 89.34 87.347 63.41 84.30 80.69 84.30 84.30 84.30 86.43 88.43 87.5310 63.41 82.49 83.49 82.39 84.21 86.23 86.53 86.43 89.3213 63.41 83.32 85.62 86.14 84.23 88.14 89.23 90.05 88.23

p Baseline PCA PCA-avgGEP (2) R-GEP (3)

r = 1 5 10 r = 1 5 10

3 0.679 0.718 0.647 0.719 0.718 0.718 0.745 0.771 0.7585 0.679 0.719 0.614 0.726 0.737 0.735 0.769 0.746 0.7497 0.679 0.707 0.610 0.707 0.707 0.713 0.763 0.785 0.73410 0.679 0.656 0.622 0.656 0.742 0.719 0.741 0.762 0.75413 0.679 0.717 0.654 0.730 0.765 0.745 0.737 0.757 0.754

Table : Healthy vs. diseased classification accuracy (top) and regression correlation coefficient (bottom)(10-fold cross validated) using GEP and R-GEP, compared to the baseline linear classifier and the PCAsetup. p is the PCA rank and r is tensor rank.

ACKNOWLEDGMENTSJH was supported by a University of Wisconsin CIBM fellowship (5T15LM007359-14). We acknowledgesupport from NIH grants AG040396 and AG021155, NSF RI 1116584 and NSF CAREER award 1252725,as well as UW ADRC (AG033514), UW ICTR (1UL1RR025011), Waisman Core grant (P30 HD003352-45),UW CPCP (AI117924) and NIH grant AG027161.

International Conference on Computer Vision (ICCV) 2015