CONSTRAINED TENSOR DECOMPOSITION OPTIMIZATION WITH ...people.ece.umn.edu/users/parhi/PAPERS/Sen_Parhi... · CONSTRAINED TENSOR DECOMPOSITION OPTIMIZATION WITH APPLICATIONS TO FMRI

CONSTRAINED TENSOR DECOMPOSITION OPTIMIZATION WITH APPLICATIONS TOFMRI DATA ANALYSIS

Bhaskar Sen, Student Member, IEEE, and Keshab K. Parhi, Fellow, IEEE

Department of Electrical and Computer EngineeringUniversity of Minnesota, Minneapolis, USA

ABSTRACTSignal estimation from functional magnetic resonance imag-ing data (fMRI) is a difficult and challenging task that in-volves carefully chosen models that can be validated bydomain experts. This paper explores constrained tensor de-composition methods for model-free estimation of signalsfrom task fMRI. Using a number of constrained tensor de-compositions, the signals are estimated as Rank-1 tensor(s).The mutli-subject fMRI data is stored as a three-way tensor(voxel × time × subject). First, the signal is decomposedusing traditional PARAFAC modeling. Second, the spatio-temporal maps in the PARAFAC formulation are constrainedto be non-negative. Third, using domain knowledge of brainactivation pattern in spatial domain for fMRI and loading ofthe spatio-temporal maps of each individual, the paper pro-poses an optimization model for solving the signal estimationproblem from task fMRI data. Three different optimizationtechniques are also used for solving the optimization prob-lems. The decomposed signal portion includes the brainspatial activation maps and corresponding time courses foreach individual during task. The solutions of the optimiza-tion are evaluated based on similarity of the task signal (theground truth) to time courses of the decomposed signal aswell as by inspecting the spatial maps visually.

Index Terms— fMRI, task fMRI, tensor decomposition,PARAFAC, spatial map, task signal

1 IntroductionFunctional magnetic resonance imaging (fMRI) provides

a non-invasive way to measure activity of brain during restingstate (r-fMRI) or task (t-fMRI). When a subject is scanned,the change of blood-oxygen level density in the brain overtime is measured. The resultant scan is a 4-D image wherethe first three dimensions are spatial and the fourth dimensionis temporal. This provides an indirect way to measure theactivities of the brain regions. When a subject performs atask, the corresponding regions involved with the task have ahigh inflow of oxygenated blood. This in turn gives rise to theso-called hemodynamic response in fMRI time series for theinput impulse excitation.

During t-fMRI, the subject is asked to perform a repeti-tive task to understand the parts of the brain associated with

that particular task. Previous studies have investigated theregions of the brain involved and time evolution using seedbased analysis [1]. A more advanced technique is so calledblind source separation using independent component anal-ysis (ICA). Although ICA can extract the signals reasonablywell, it suffers from non-uniqueness of separated signalswhen sources are Gaussian. PARAFAC tensor decompositionpromises to alleviate some of the problems posed by ICAas the decomposition is generally unique under very mildassumptions [2]. Preliminary studies using tensor decom-postions for t-fMRI have been studied in [3, 4]. This paperextends the tensor models to include a number of constraintsand investigates the applicability of models for extractingspatio-temporal hemodynamic signals from t-fMRI. Specifi-cally, the paper explores application of tensor decompositionmodels: one unconstrained and two constrained models andthree optimization algorithms.

The contribution of this paper is three-fold. First, it ex-plores in detail the applicability of PARAFAC models forfMRI data analysis using three models and three optimiza-tion techniques. Second, it shows that fMRI data can be de-composed as Rank-1 tensors where Rank-1 tensor is outerproduct of three vectors. Lastly, the paper shows that con-strained PARAFAC can separate fMRI signals with high ac-curacy with non-negative and orthogonality constraints andwith exact, gradient based and ADMM aproaches.

2 PARAFAC Decomposition ModelEach fMRI scan is reformatted to a matrix (dim1 =

spatial, dim2 = temporal). The scans from a group ofsubjects are concatenated to form a three-way tensor wherethe dim3 = subject. We denote the three-way tensor of sizeI × J ×K as XI,J,K .2.1 Model 1

The first technique [1] involves the use of straightforwardPARAFAC [2] modeling to decompose the tensor XIJK intoF rank one tensors

XI,J,K =

F∑f = 1

af ◦ bf ◦ cf , X = (A,B,C) (1)

where af = A(:, f), bf = B(:, f), cf = C(:, f). Ourhypothesis is that for unique PARAFAC decomposition, each

tensor will consist of one time signal for a task (each columnof B), corresponding spatial map inA (corresponding columnin A) and the loading of the spatio-temporal maps for eachsubject (columns in C). In this case we try to optimize thefollowing function:

minA,B,C

||X1 − (C �B)AT ||2F (2)

where X1 = XJK,I . Likewise, we also define X2 =XKI,J and X3 = XIJ,K . X1, X2 and X3 are re-formatted matrices derived from tensor X . We denotef(A,B,C) = ||X1− (C�B)AT ||2F . This is same as writingf(A,B,C) = ||X2 − (C � A)BT ||2F and f(A,B,C) =||X3 − (B � A)CT ||2F . A good review of tensor decomposi-tion for signal processing applications can be found in [5].2.2 Model 2

Second, the separation problem is attacked from the per-spective of fMRI data values. The voxel values of fMRI dataare nonengative in general. Hence by enforcing the conditionsthat A ≥ 0, B ≥ 0 and C ≥ 0, the optimization problem inthis case becomes

minA,B,C

||X1−(B�A)CT ||2F s.t. A ≥ 0, B ≥ 0, C ≥ 0. (3)

2.3 Model 3Third, tensor decomposition in the previous two cases

may introduce components that are hard to interpret frombiological perspectives. For example, one common empiricalhypothesis is that different regions in the brain are responsiblefor different tasks [6]. Hence in a healthy brain, the cross-talkbetween spatial components should be as small as possible.We can incorporate this using ATA = Σ where Σ is diago-nal. Also, one can assume the weighting of spatio-temporalmaps in each subject to be non-negative (C ≥ 0). Hence thecorresponding optimization becomes [4]:

minA,B,C

||X1 − (B �A)CT ||2F s.t. ATA = Σ, C ≥ 0. (4)

This can be solved using alternating optimization techniqueby keeping two matrices constant and minimizing for thethird. Typically, the tensor decomposition solution fromMATLAB satisfies ATA = I .

3 Optimization MethodsIn this paper our goal is to examine a number of optimiza-

tion techniques to solve the previous three models. Some ofthe gradients and calculations can be found in [5, 7]. Moreprecisely, we use the following optimization techniques.3.1 Cyclic Block Coordinate Descent or Alter-

nating Exact Optimization:This method applies coordinate descent with exact min-

imization. During each iteration the update equations withexact minimization can be expressed as:

1. Model 1 (Unconstrained Problem): Here each itera-tion just solves ordinary least squares.

Ar+1 = [(Cr �Br)T (Cr �Br)]−1(Cr �Br)TX1

Br+1 = [(Cr �Ar+1)T (Cr �Ar+1)]−1(Cr �Ar+1)TX2

Cr+1 = [(Br+1 �Ar+1)T (Br+1 �Ar+1)]−1

× (Br+1 �Ar+1)TX3

(5)where r represenats interation number.

2. Model 2 (Non-negative Constraints): This is astraightforward extension of iteration steps of Model 1to include the projection on to non-negative constraints.This can be achieved in following way.Ar+1

n = [(Cr �Br)T (Cr �Br)]−1(Cr �Br)TX1

Ar+1 = [Ar+1n ]+

Br+1n = [(Cr �Ar+1)T (Cr �Ar+1)]−1(Cr �Ar+1)TX2

Br+1 = [Br+1n ]+

Cr+1n = [(Br+1 �Ar+1)T (Br+1 �Ar+1)]−1

(Br+1 �Ar+1)TX3

Cr+1 = [Cr+1n ]+

(6)Here [M ]+ represents [M ]+(i, j) = 0 if M(i, j) ≤ 0.Also, [M ]+(i, j) = M(i, j).

3. Model 3 (Orthogonality Constraint) For Model 3, Band C can be updated as above. However, to update A,we can use Orthogonal Procrustes solution to updateA.This can be solved in the following way [8].

Ar+1n = [(Cr �Br)T (Cr �Br)]−1(Cr �Br)TX1

Ar+1n = UΣV T

Ar+1 = UV T ;(7)

Here UΣV T is singular value decomposition of Ar+1n .

3.2 Cyclic Block Coordinate Descent with Gra-dient Step:

The cyclic block co-ordinate descent with gradient stepwould have the following form at each iteration.

Ar+1 = Ar − α∇rAf(A,B,C)

Br+1 = Br − α∇rBf(A,B,C)

Cr+1 = Cr − α∇rCf(A,B,C)

(8)

After these iterations, based on the constraints, the vari-ables need to be projected within the constraint set. Now, letus discuss the gradient step for each model.

1. Model 1 (Unconstrained Problem): Here we do nothave any constraints. Hence, we do not need the gradi-ents to be projected back to the set. The gradient stepsare given by [9]:∇Af(A,B,C) = −2(XT

1 −A(C �B)T )(C �B)

∇Bf(A,B,C) = −2(XT2 −B(C �A)T )(C �A)

∇Cf(A,B,C) = −2(XT3 − C(B �A)T )(B �A)

(9)

2. Model 2 (Non-negative Constraints): This is astraightforward extension of iteration steps of Model 1to include the projection on to non-negative constraints.This can be achieved in the following way [10].

Ar+1n = Ar − α∇r

Af(A,B,C)

Ar+1 = [Ar+1n ]+

Br+1n = Br − α∇r

Bf(A,B,C)

Br+1 = [Br+1n ]+

Cr+1n = Cr − α∇r

Cf(A,B,C)

Cr+1 = [Cr+1n ]+

(10)

Here [M ]+ represents [M ]+(i, j) = 0 if M(i, j) ≤ 0.Also, [M ]+(i, j) = M(i, j)

3. Model 3 (Orthogonality Constraint) For Model 3, Band C can be updated as above. However to projectA, we can use Orthogonal Procrustes solution for theprojection. This can be solved in the following way [8].

Ar+1n = [(Cr �Br)T (Cr �Br)]−1(Cr �Br)TX1

Ar+1n = UΣV T

Ar+1 = UV T .(11)

Here UΣV T is singular value decomposition of Ar+1n .

3.3 Note on Step Size α:To calculate the step size, ideally one should take the Hes-

sian of the function with respect to the variables. However,calculating the Hessian involves calculating tensor productleading to a matrix of very large size. Fortunately, in this ap-plication, we notice that the gradient is a linear function withrespect to each variable. Hence, if all A, B, C were scalars,the Hessian would be (C � B)T (C � B), (C � A)T (C �A), (B � A)T (B � A), respectively. Suppose, LA = ||(C �B)T (C � B)||F , LB = ||(C � A)T (C � A)||F , LC =||(B � A)T (B � A)||F . We use αA = 0.75/LA, αB =0.75/LB , αC = 0.75/LC as the step size. In case of thematrix decomposition problem, the Hessains are computed asLA = ||BTB||F , LB = ||ATA||F .3.4 Optimization via Alternating Direction

Method of Multipliers (ADMM)In this case, we follow [7] and extend it to tensors. The

ADMM formulation for constrained tensor decompositioncan be formulated as

minA,B,C,U,V,W

f(A,B,C) + Λ(A− U) + Π(B − V )

+Γ(C −W ) +α

2||A− U ||2F +

β

2||B − V ||2F +

γ

2||C −W ||2F

(12)where U ∈ A, V ∈ B,W ∈ C, A,B, C are the constraint setson A, B, C, respectively. The iteration steps are as follows:

1.

(Ar+1, Br+1, Cr+1) = arg minA,B,C

f(A,B,C)

+ Λ(A− U) + Π(B − V ) + Γ(C −W ) +α

2||A− U ||2F

+β

2||B − V ||2F +

γ

2||C −W ||2F

(13)This can be performed in three steps [9]:

Step 1:

Ar+1 = (X1(Cr �Br) + αUr − Λr)×[(Cr �Br)T (Cr �Br) + αIF ]−1

(14)

Step 2:

Br+1 = (X2(Cr �Ar+1) + βV r −Πr)×[(Cr �Ar+1)T (Cr �Ar+1) + βIF ]−1

(15)

Step 3:

Cr+1 = (X3(Br+1 �Ar+1) + γW r − Γr)×[(Br+1 �Ar+1)T (Br+1 �Ar+1) + γIF ]−1

(16)

2. Update (U,V,W) in the following way:

Ur+1 = [Ar+1 +1

αΛr]A

V r+1 = [Br+1 +1

βΠr]B

W r+1 = [Cr+1 +1

γΓr]C

(17)

Where [M ]+ is projection is M into set {+}.

3. Update (Λ,Π,Γ) in the following way:

Λr+1 = Λr+1 + α(Ar+1 − Ur+1)

Πr+1 = Πr+1 + β(Br+1 − V r+1)

Γr+1 = Γr+1 + γ(Cr+1 −W r+1)

(18)

We chose α, β, γ to be small numbers.This separation problem has been solved using indepen-

dent component analysis (ICA) for single subject and grouplevel [11]. A semi-tensorial extension has been proposed thatinvolves ICA estimation at each iteration [3]. However, it de-pends on interpreting the tensor in just one way (X1). It doesnot optimize for X2 and X3.

4 fMRI Dataset4.1 Visuomotor Task

The dataset used is a visuomotor task [12] data availablefrom fMRI GIFT toolbox 1. The dataset consists of three sub-jects. Details of the dataset are given in [12, 4].

1http://mialab.mrn.org/software/gift/

http://mialab.mrn.org/software/gift/

Fig. 1: Convergence for three Optimization Algorithms- for arandom tensor.

4.2 Emotion Task from the Human Connec-tome Project (HCP)

This task was adapted from the one developed by [13].Participants are presented with blocks of trials that either askthem to decide which of the two faces presented on the bottomof the screen match the face at the top of the screen, or whichof the two shapes presented at the bottom of the screen matchthe shapes at the top of the screen. The faces have either anangry or fearful expression. Trials are presented in blocks of6 trials of the same task (face or shape), with the stimulus pre-sented for 2000 ms and a 1000 ms ITI. Each block is precededby a 3000 ms task cue (shape or face), so that each block is21 seconds including the cue. Each of the two runs includesthree face blocks and three shape blocks, with 8 seconds offixation at the end of each run. Preprocessed data from HCPwebsite 2 for 475 subjects are used for the experiment. Usingfreesurfer 3, 85 anatomical regions and their time courses areused for each subject.

5 Results5.1 Convergence Curves5.1.1 Randomly Generated Data

The convergence of all algorithms is compared on a ran-domly generated tensor of size 50×50×10. The convergencecurve is shown in Fig. 1.5.1.2 Visuomotor Task fMRI Data

We analyze the tensor decomposition convergence for thethree models to check if there is any significant difference.This serves as a sanity check for the validity of these algo-rithms. Convergence is plotted in terms of change in lossfunction over iteration. The convergence of the loss func-tion for block-wise exact minimization is shown in Fig. 2.The convergence of the optimization for block-wise coordi-

2https://www.humanconnectome.org/3https://surfer.nmr.mgh.harvard.edu

Fig. 2: Convergence for three models for coordinate descentwith exact minimization. left: Model 1, middle: Model 2,right: Model 3.

Fig. 3: Evolution of iteration for ADMM. left: Optimizationof loss function over iterations, right: Evolution of auxiliaryvariable over iterations.nate descent with gradient step also follows similar pattern.The solutions converge monotonically across spatial, tempo-ral and subject dimensions.

For ADMM applied to t-fMRI, the evolution of loss func-tion is shown in Fig. 3(a). The difference between A andthe auxiliary variable evolution over iterations is also illus-trated in Fig. 3(b). Also after 400 iterations, A becomes non-negative.5.2 Visuomotor Data

The number of components for decomposition is fixed at5 in all models.5.2.1 Time Courses

This section provides the extracted time courses for dif-ferent models. The correlation coefficients of extracted timecourses are also compared with respect to the ground truth.

The extracted time signals for one task for each of themodels for exact minimization are shown in Fig. 4. Corre-lation coefficient values of extracted time courses with theground truth are shown in Table 1.

Fig. 4: Extracted time course signal 1 for different models forco-ordinate descent with exact minimization.left: Model 1,middle: Model 2, right: Model 3. Red: Ground Truth, Blue:Extracted Signal.

https://www.humanconnectome.org/

https://surfer.nmr.mgh.harvard.edu

Table 1: Correlation coefficient for coordinate with exact minimization, gradient descent and ADMM

ModelsExact step Gradient step ADMM

Signal 1 Signal 2 Signal 1 Signal 2 Signal 1 Signal 2Model 1 0.8655 0.9421 0.8384 0.9426 0.8788 0.8986Model 2 0.9389 0.9867 0.9443 0.9869 0.8695 0.9459Model 3 0.8820 0.8511 0.8439 0.8907 0.7679 0.8165

Fig. 5: Extracted spatial maps for coordinate descent. a)Model 1 b) Model 2 c) Model 3.

ADMM is implemented both for Model 2 and Model 3;it is found that Model 2 works well with ADMM. The cor-responding correlation coefficients are given by 0.8695 and0.9459, repectively.5.2.2 Spatial Maps

We show the z-score value of the spatial maps normal-ized to mean 0 and standard deviation 1 in Fig. 5. Here, moreyellow or blue the color is, the farthest it is from the meanvoxel activation values. Even though they may not match theextracted spatial maps exactly, the relative colors can give anidea about the regions that are more involved in the decom-posed factors. time courses and spatial maps for each of thealgorithm. In this section, we provide the extracted spatialmaps for coordinate descent with gradient descent step foreach of the models. Coordinate descent with exact minimiza-tion leads to similar maps. This is shown in Fig. 5. From thefigure, it is clear that model 1 has relatively higher activationsin areas in background brain map. Model 2 forces many ofthe voxel maps to be 0. Model 3 has has relatively better con-trast in spatial maps as it reduces the activation level of someof the background voxels.5.3 HCP Data

The tensor decomposition models are applied to the emo-tion task data from HCP. The number of components is fixedat 5. Fig. 6 shows the resultant signals as time courses

Fig. 6: Extracted important regions and time courses fromHCP data using tensor decomposition. The time course isalso compared with ground-truth and ICA.

Table 2: Comparison of results for four models interm of timecourses

Visuomotor task Emotion taskGroup-ICA [12] 0.91 0.33Tensor-ICA [3] 0.86 0.29

orthogonal-PARAFAC 0.88 0.51constrained-PARAFAC 0.98 0.58

and activated regions (shown as small red balls). The timecourses are also compared with the results from group-ICAand ground truth of hemodynamic response. Here, the con-strained tensor model is able to extract the associated timecourses corresponding to emotion task with higher correla-tion.6 Discussion

For both random tensor and fMRI tensor cases, co-ordinate descent with exact minimization works the bestand converges the fastest followed by coordinate descentwith gradient step and ADMM. This is expected as ADMMand gradient methods approximate the solution that exactmethod achieves. However, that does not mean that the latertwo methods are necessarily worse. This is because, coor-dinate descent with exact minimization requires inversion ofmatrices which is not required for the other two methods.

To understand the quality of the extracted signal, we lookat the extracted time courses and compare the correlation val-ues with ground truth for each of the algorithms. In this case,Model 2 wins almost in all cases. This may be due to the factthat non-negative decomposition favors part based decompo-

sitions where spatial maps will be contiguous. As nearby re-gions in the brain are more likely to be activated together,Model 2 performs better in terms of extracted signals.

To understand the effectiveness of fMRI source separa-tion, the quality of extracted spatial maps should be checked.Although it seems all spatial maps are quite similar, there arein fact some differences between different maps as shown inFig. 5. For example, Model 1 makes almost all areas (eventhe background) quite active (of high value) which should notbe the case. However, both Model 2 and Model 3 capturespatial areas of interest with relatively low activation in thebackground. Which one of these is better in general? Weprobably need to apply this model to other fMRI tensors forunderstanding this better. However, combining time courseand spatial maps, Model 2 definitely works better (both forvisumotor and HCP). The applicability of ADMM in fMRIis encouraging from the experiments as the time courses andspatial maps were successfully extracted using ADMM. Thisprovides a future opportunity to apply block-based parallelADMM decomposition for fMRI tensors for larger datasets.7 Conclusion

This paper addresses an application of commonly usedblind source separation in fMRI domain. Tensor decompo-sition becomes a natural extension for fMRI source separa-tion for a group of subjects. The basic idea is to decomposea three-way t-fMRI tensor into three matrices each of whichprovides information regarding spatial maps, time signals andsubject involvement during the tasks. We implemented a fewdifferent approaches towards solving the optimization prob-lem and compared their results. The task signals (hemody-namic response that is captured in MRI) which can be mod-eled as Rank-1 tensors were evaluated using Pearson’s corre-lation coefficient as the metric. We also compared the spatialmaps with the ground truth. This approach could potentiallylead to useful insights for fMRI source separation. Accuratelyinferred signal and spatial maps from groups can be used tocompare differences between two types of groups (e.g., pa-tients vs. controls, male vs. female etc). In addition, to ex-ploit both interpretability (Model 3) and accuracy (Model 2),future works will be directed to combining these two modelsfor a better source separation algorithm for fMRI.8 Acknowledgment

Support from MnDRIVE-Informatics PhD Graduate As-sistantship, University of Minnesota is gratefully acknowl-edged.9 References

[1] K. J. Friston, A. P. Holmes, K. J. Worsley, J. P. Poline,C. D. Frith, and R. S. J. Frackowiak, “Statistical para-metric maps in functional imaging: a general linear ap-proach,” Human Brain Mapping, vol. 2, no. 4, pp. 189–210, 1994.

[2] R. Bro, “PARAFAC: tutorial and applications,” Chemo-metrics and Intelligent Laboratory Systems, vol. 38, no.2, pp. 149–171, 1997.

[3] C. F. Beckmann and S. M. Smith, “Tensorial exten-sions of independent component analysis for multisub-ject fMRI analysis,” Neuroimage, vol. 25, no. 1, pp.294–311, 2005.

[4] B. Sen and K. K. Parhi, “Extraction of commontask signals and spatial maps from group fMRI usinga PARAFAC-based tensor decomposition technique,”Proc. of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP), pp. 1113–1117, 2017.

[5] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang,E. E Papalexakis, and C. Faloutsos, “Tensor decomposi-tion for signal processing and machine learning,” IEEETransactions on Signal Processing, vol. 65, no. 13, pp.3551–3582, 2017.

[6] D. S. Bassett, N. F. Wymbs, M. P. Rombach, M. A.Porter, P. J. Mucha, and S. T. Grafton, “Task-based core-periphery organization of human brain dynamics,” PLoSComputational Biology, vol. 9, no. 9, pp. e1003171,2013.

[7] L. Xu, B. Yu, and Y. Zhang, “An alternating directionand projection algorithm for structure-enforced matrixfactorization,” Computational Optimization and Appli-cations, vol. 68, no. 2, pp. 333–362, 2017.

[8] P. H. Schonemann, “A generalized solution of the or-thogonal procrustes problem,” Psychometrika, vol. 31,no. 1, pp. 1–10, 1966.

[9] A. P. Liavas and N. D. Sidiropoulos, “Parallel algo-rithms for constrained tensor factorization via alternat-ing direction method of multipliers,” IEEE Transactionson Signal Processing, vol. 63, no. 20, pp. 5450–5463,2015.

[10] C. L. Lawson and R. J. Hanson, “Solving least squaresproblems,” Prentice-Hall, Chapter 23, p. 161, 1974.

[11] V. D. Calhoun, T. Adali, G. D. Pearlson, and J.J. Pekar,“A method for making group inferences from functionalMRI data using independent component analysis,” Hu-man Brain Mapping, vol. 14, no. 3, pp. 140–151, 2001.

[12] V. Calhoun, T. Adali, V.B. McGinty, J. J. Pekar, T. D.Watson, and G. D. Pearlson, “fMRI activation in avisual-perception task: network of areas detected usingthe general linear model and independent componentsanalysis,” NeuroImage, vol. 14, no. 5, pp. 1080–1088,2001.

[13] A. R. Hariri, A. Tessitore, V. S. Mattay, F. Fera, andD. R. Weinberger, “The amygdala response to emotionalstimuli: a comparison of faces and scenes,” Neuroim-age, vol. 17, no. 1, pp. 317–323, 2002.

Documents

CONSTRAINED TENSOR DECOMPOSITION OPTIMIZATION WITH ...people.ece.umn.edu/users/parhi/PAPERS/Sen_Parhi... · CONSTRAINED TENSOR DECOMPOSITION OPTIMIZATION WITH APPLICATIONS TO FMRI