Classi cation of task related fMRI: a complex network analysis · 1 Introduction The study of task-free, or resting-state, fMRI data using graph theory has gained broad interest in

Classification of task relatedfMRI:

a complex network analysis

Projects in Machine Learning andArtificial Intelligence

Name: Patrik BeyMatrikelnr.: 352274Department: Software Engineering and Theoretical Computer ScienceChair: Methods for Artificial IntelligenceSupervisor: Prof. Dr. M. Opper

Contents

1 Introduction 4

2 Data 42.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Methods 63.1 Complex network computation . . . . . . . . . . . . . . . . . . 6

3.1.1 Correlation measure . . . . . . . . . . . . . . . . . . . 63.1.2 Threshold definition . . . . . . . . . . . . . . . . . . . 7

3.2 Network parameters . . . . . . . . . . . . . . . . . . . . . . . 93.2.1 Node degree . . . . . . . . . . . . . . . . . . . . . . . . 93.2.2 Node strength . . . . . . . . . . . . . . . . . . . . . . . 93.2.3 Clustering Coefficient . . . . . . . . . . . . . . . . . . . 93.2.4 Shortest path length . . . . . . . . . . . . . . . . . . . 103.2.5 Closeness centrality . . . . . . . . . . . . . . . . . . . . 103.2.6 Connection cost . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3.1 Support Vector Machine . . . . . . . . . . . . . . . . . 113.3.2 Performance measure . . . . . . . . . . . . . . . . . . . 11

4 Results 134.1 Single parameter analysis . . . . . . . . . . . . . . . . . . . . . 134.2 Joint parameter framework . . . . . . . . . . . . . . . . . . . . 14

5 Discussion 16

1

List of Figures

1 The standard n - Back paradigm for neuropsychological ass-esment. The participant is asked to state wether or not theshown image is identical to the image n instances before in theimage sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 The resulting data matrix for the twelve Brodmann areas andthe corresponding voxels. These resulted in 180 vectors ofvoxels representing the cross-ROI patterns of activity on thex-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 The symmetric correlation matrix R containing the correlationfor each voxel activity pattern to each other. The clusters ofhigh correlation between neighbouring pattern vectors indicatethe identity of activity patterns for cross ROI voxel vectors . . 8

4 OSH, margin and support vectors for the linear separable case(modified from [6]) . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Classification accuracy for all network parameter and the threecontrasts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6 Classification accuracy in whole parameter space for the rangeof thresholds. The green / red line shows the ratio of truepositives / negatives . . . . . . . . . . . . . . . . . . . . . . . 14

7 Ratios for true positives (blue) / negatives (red) for each singleparameter show the biased discriminative information of thegiven network parameters. . . . . . . . . . . . . . . . . . . . 15

8 p-values for between group t-tests for each of the 18 param-eter vector (6 network parameter per contrast image) for allthresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

List of Tables

1 List of the twelve Brodmann areas with corresponding voxelsize as given in the original data . . . . . . . . . . . . . . . . . 6

2

AbstractSchizophrenia as a neurological disease causes the formation ofnew association and influences existing functional connections(Jensenet al. (2008)). We analyzed functional magnetic resonance imag-ing (fMRI) data from a n-Back task with three different con-ditions (2-Back vs. 0-Back, 0-Back vs. rest, 2-Back vs. rest)using a correlation based graph theoretical approach. The analy-sis of distributed functional brain networks via graph theory hasgained broad attention in recent literature (e.g. Bullmore andSporns(2009), Brier et al. (2014)). In most studies, network pa-rameters describing the ¨small worldness¨ of a given graph areused for discriminating between different brain states (inducede.g. by certain tasks; Minati et al. (2012)) or different diseases(Tijms. et al. (2013)). While analysis of resting-state fMRIreport disease specific alterations of brain network topology, dys-functional neurological diseases such as schizophrenia may showsignificant impact during cognitive activation.

Here we investigate the discriminative power of a classificationframework based on graph theory for task-related fMRI data anddisease status prediction. For this, we calculated a high variety ofnetwork parameters (such as node strength, clustering coefficient,path length etc.) for each cognitive task and used a support vec-tor machine (SVM) algorithm for classification in the resultingparameter space.

Our results underline the ability of graph theoretical measures tocapture discriminating topological information on a task-relatedlevel and show an improvement in performance due to additionalnetwork parameters as compared to only ”small-world” proper-ties.

Keywords: SVM, fMRI, complex network analysis, graph theory

3

1 Introduction

The study of task-free, or resting-state, fMRI data using graph theory hasgained broad interest in the neuroscience community in recent years [2] [3].And while a lot of studies reported significant differences in network topol-ogy between groups of healthy controls and patients of neurodegenerative andneurological diseases such as Alzheimer’s disease or epilepsy, only few stud-ies investigate the descriptive power of network parameters for task-relatedfMRI [7] allthough it has been demonstrated that cognitive and psychiatricdisturbances are correlated with functional network architecture [12].This project therefore focuses on the power of a classification framework thatuses the respective network parameters for a n-Back task for patients withschizophrenia and normal controls as input features.We hypothesize that parameters describing the complex network based oncorrelation between local patterns of voxel activity over specific regions of in-terests (ROI) may contain discriminative information and therefore to someextend enable classification in the resulting parameter space. In this paperwe give a short introduction into the theoretical concepts of complex networkanalysis and the applied classification technique. We illustrate the latter onthe given data and show the discriminative information content of differentnetwork measures and investigate the performance accuracy using leave oneout cross validation. We further discuss our findings with respect to futureapplications.

2 Data

The present data was obtained with an 1.5T MRI machine while the subjectsperformed three different n-Back tasks. The study contained of 199 subjects,100 schizophrenia patients and 99 healthy controls. Due toe the differentquality o the data several patients had to be excluded resulting in balancedgroup sizes of 56 subjects per group. This way classification can not be biasedtowards one of the groups.

2.1 Experiment

Subjects were shown a series of images and at certain timepoints were askedto state wether the image they see is the same as the image n images before.

4

Figure 1: The standard n - Back paradigm for neuropsychological assesment.The participant is asked to state wether or not the shown image is identicalto the image n instances before in the image sequence.

In the given data the contrast images for three different n-Back tasks werecreated. For this the difference between the averaged BOLD signals presentduring 2−Back task and 0−Back task was computed, as well as for 2−Backvs. resting state and 0−Back vs. resting state.

2.2 Data preprocessing

The data from the resulting contrast images was reduced to the average voxelvalues belonging to twelve anatomically a priori defined regions of interests.These ROIs were defined as the Brodmann areas (BA) listed in table 1.

Sinze these BAs differ largely in size, the ROI vectors were reduced indimensionality by averaging neighbouring voxel values such that all vectorswere of dimension 180. This way the relative position of active voxels withineach ROI was preserved which was needed to identifiy respective local pat-terns of activity across ROIs. The resulting data matrix containing the av-eraged values for each BA looked is shown in Figure 2 for one examplatorysubject:

5

Brodman Area Name Number of voxelBA L8 Frontal eye fields 890BA L9 Dorsolateral prefrontal cortex 1080BA L44 Pars opecularis 202BA L45 Pars triangularis Broca’s area 190BA L46 Dorsolateral prefrontal cortex 180BA L47 Pars orbitalis 450BA R8 Frontal eye fields 820BA R9 Dorsolateral prefrontal cortex 1120BA R44 Pars opecularis 220BA R45 Pars triangularis Broca’s area 220BA R46 Dorsolateral prefrontal cortex 220BA R47 Pars orbitalis 460

Table 1: List of the twelve Brodmann areas with corresponding voxel size asgiven in the original data

3 Methods

In the following we will describe the respective methods for the computationof correlation, the definitions of the used network measures as well as give ashort introduction into support vector machines (SVM).

3.1 Complex network computation

Brain activity as a result of a cognitive task is not just limited to a cer-tain brain region, but rather consists of a widespread network of active ar-eas where even indirect interactions can account for additional functionallinkages[3]. These linkages may become detectable when investigating theinterconnectedness of activity patterns present over relevant regions for eachtask. In a network theroetical approach interconnectednes may be expressedas the correlation between the voxels during cognitive activation. We there-fore chose the Pearson correlation coefficient as a measure for the functionalconnectivity of the pattern vectors.

3.1.1 Correlation measure

The standard Pearson correlation coefficient is defined as:

6

Figure 2: The resulting data matrix for the twelve Brodmann areas and thecorresponding voxels. These resulted in 180 vectors of voxels representingthe cross-ROI patterns of activity on the x-axis.

r :=

∑i(Xi −X)(Yi − Y )√∑

(Xi −X)2√∑

(Yi − Y )2. (1)

Where X and Y are the pattern vectors under investigation.We computed r for each pattern vector combination (as defined in section2.2 Data preprocessing) which resulted in a symetric matrix R (see Figure3).

This matrix was used to create the corresponding weigthed undirectedgraph by creating the adjacency matrix G.

3.1.2 Threshold definition

To get from R to the adjancey matrix G we defined a range of thresholdst for ri,j. Two aspects here are very important to ensure the comparabil-ity between brain networks. The first one is to set ri,i equal to zero to getrid of self directing nodes which would influence the threshold definition as

7

Figure 3: The symmetric correlation matrix R containing the correlation foreach voxel activity pattern to each other. The clusters of high correlation be-tween neighbouring pattern vectors indicate the identity of activity patternsfor cross ROI voxel vectors

described below, and second to set the threshold greater or equal to the min-imal maximum correlation of each node for each subject.

t ≥ min(max(R)) (2)

This way we ensure that the graphs are fully connected in the network whichwould dramaticaly influence network parameters describing the topology andmake the between subject comparison impossible[2].We therefore used the following threshold selection:

Gi,j = Ri,j;∀Ri,j ≥ t (3)

One may also create a binary adjacency matrix by setting Gi,j = 1 if thethreshold is surpassed but this would result in an unweighted graph and whileit might be sufficient for the computation of the most network measures ( seesection 3.2 Network parameters) we may discard additional informationabout the intensity of certain connections.

8

3.2 Network parameters

To fully capture the given network topology we computed a wide range ofnetwork parameters for each subject for each task. Most studies performingnetwork analysis based on fMRI data focus only on the ’small worldness’ ofa given network. This may allready show significant differences between pa-tients of neurological diseases and controls (see e.g. [2][10]), but the reportedinfluence on single parameters differs between studies [2] and may thereforenot be sufficient in context of a classification framework.

3.2.1 Node degree

The most basic network metric describes the connectedness of a given nodewithin the whole network as defined in [9].

ki :=N∑j=1

ai,j (4)

where ai,j = 1 if Gi,j 6= 0 and 0 else. N is the total number of nodes inthe network. This was averaged over the whole number of nodes to get anaverage value for the respective graph:

3.2.2 Node strength

While node degree captures the average connectedness of a node in the graph,node strength is a measure for the tightness of these connections[12]. It isthe sum of the connection weights as defined in the matrix G

Si :=N∑j=1

Gi,j (5)

3.2.3 Clustering Coefficient

The clustering coefficient C is a famous parameter in network analysis sinceit is a representation for efficiency in information transfer across the graph.There are several definitions for C used. Here we follow the interpretationas transitivity as descirbed e.g. in [8]. This defines the clustering coefficient

9

as the ratio of the number of triangles at a given node versus the possiblenumber of connections.

C :=

∑i 2 ∗ ti∑

i ki(ki − 1)(6)

Where ti is the number of triangle motifs for node i and given by:

ti :=1

2∗∑j,k

Gi,j ∗Gi,k ∗Gj,k (7)

This parameter was again averaged over all nodes to represent the respectivetopological property for the whole graph.

3.2.4 Shortest path length

While a high clustering coefficient is a measure for the functional segregation[2] the averaged shortest path length measures the integration within thenetwork[9].

di,j :=∑

auv∈gi−j

auv (8)

where gi−j is the shortest path from node i to node j. And

di :=1

N

∑j

di,j (9)

is the average path length for node i.

3.2.5 Closeness centrality

As an indicator for the spatial closeness in the connectivity space of a nodethe parameter Cl is defined as the ratio of reachable nodes over the summeddistance to these nodes[9].

Cli := ki −1∑ki

j=1Gi,j

(10)

10

3.2.6 Connection cost

The most basic measure to estimate the cost of connectivity in the givengraph is the connection cost. It is defined as the ratio of the number ofconnectionts present over the number of possible connection[3].

Cost :=K

N ∗ (N − 1)(11)

where K =∑

i ki.

3.3 Classification

For the classification task in the resulting parameter space we trained a linearsupport vector machine (SVM) and assesed performance using leave one outcross validation (LOOCV) as briefly discussed in the following.

3.3.1 Support Vector Machine

Support vector machines belong to the class of maximum margin classifiers.This technique is based on the idea to divide a data set into subsets byconstructing a separating hyperplane in such a way, that it is furthest awayfrom the nearest points from the opposite classes [5]. To underline the ideaFigure 4 shows the most simplest case for the linear seperable case in twodimensions. Figure 4 shows the optimal separating hyperplane (OSH) asa line in the feature space having the maximum distance to the nearestneighbors of the different classes. The space on both sides of the OSH and thesupport vectors (data points lying on the margin boundaries) is the margin.The classification is done by maximizing the objective

f(x) := sign(wTxi + b) (12)

with respect to the inequalitiy constraint:

yi(wTxi + b)− 1 ≥ 0,∀i. (13)

3.3.2 Performance measure

LOOCV is a technique to investigate the ability of the trained classifier tohandle previously unseen data (test set) while having been trained on the

11

Figure 4: OSH, margin and support vectors for the linear separable case(modified from [6])

training data. Here for each iteration one subject is excluded from the dataset to serve as test data. The label given by the classifier for this test datais compared with the real label and classification performance is defined asthe ratio of correctly given labels (true positives + true negatives) versus theactual number of subjects in the classes.

Performance := (

∑truepositives

classsize+

∑truenegatives

classsize)/2 (14)

The algorithm for LOOCV in a classification framework for the totalnumber of subjects S looks as following:

1. for i in 1 : S

2. TRAIN := DATA(-i,:)

3. TEST := DATA(i,:)

4. train SVM on TRAIN

5. use trained machine to classify TEST

6. if given label == true label : true positive / negative +1

7. compute performance accuracy using equation 14

12

4 Results

Classification of the resulting parameter spaces using SVMs is shown below.First the performance based only on the resepctive network parameters foreach of the three contrast images. Second the performance over the range ofthresholds in the whole feature space containing the values for all contrastimages and all network parameter.

4.1 Single parameter analysis

To asses the information content of the different parameters, classificationwas performed in the parameter space for every single network measure asdefined in section Network parameters.

Figure 5: Classification accuracy for all network parameter and the threecontrasts.

As a first result shown in Figure 5, it is obvious that the classificationaccuracy is only slightly above chance for most of the settings.

13

4.2 Joint parameter framework

We further performed classification in the whole feature space of all networkparameters and observe a slightly higher accuracy as shown in Figure 6.The overall bad performance may be due to the big difference in the ratiofor true positives / negatives. This discrepancy appears allthough the dataset was balanced and a bias towards one class was not expected but seemsto be the case here. When looking at the ratio of true positives / negativesfor each parameter individually we may find one of the possible reasons forthe overall bad performance of the classifier.

Figure 6: Classification accuracy in whole parameter space for the range ofthresholds. The green / red line shows the ratio of true positives / negatives

14

Figure 7: Ratios for true positives (blue) / negatives (red) for each singleparameter show the biased discriminative information of the given networkparameters.

It is not clear why the network parameters only capture the topologicalproperties describing only one of the classes and change in an arbitrary look-ing manner between parameter and contrast images. This is displayed inFigure 7 where the ratios of true positives / negatives are shown over thegiven parameter. These results were not due to a failure of the classifier todistinguish between groups and just label all the subjects according to oneclass, which would result in a symmetric image of the two lines around they-axis value of 0.5. Those ratio values were computed by taking the ratio ofthe correctly labeled subjects per class over the class size, which was equalfor both groups. Further analysis of the parameter space showed significantdifferences in p-values for between group t-tests for each parameter (see Fig-ure 8). Since the goal of the study is to investigate the frameworks powerto infer probability about the state of a subject to be a patient, the ratio oftrue positives over the actual number of patients may be most informativein this regard. The analysis shows that the framework is far more capable ofdetecting the disease state ”patient” over ”control” when looking .

15

Figure 8: p-values for between group t-tests for each of the 18 parametervector (6 network parameter per contrast image) for all thresholds

5 Discussion

While the overall performance accuracy was below standard in recent litera-ture for classification of fMRI data, we succesfully demonstrated the abilityof complex network analysis to infer topological properties for task-relatedfMRI data as classification features. Since this is a relatively new approachcomparable recent literature for this context is sparse. We further demon-strated the improvement of classification due to additional network param-eters which is a valid approach for future research considering most studiestoday focus on the the ratio of shortest path length and clustering coefficientas representatives for ’small worldness’ of a network. Classification accuracycould be improved by further investigating the different information contentof the parameters in combination with different cognitive tasks as they tend

16

to vary significantly as shown by the p-values in Figure 8. A selective com-bination of relatively more informative parameters with regard to diseasestate prediction may improve overall performance. Furthermore an adaptionof the classifier may increase classification performance as well since we onlyapplied a linear kernel SVM and did not adjust for a range of the cost pa-rameter C, which may decrease the number of false negatives, due to thehigher penalty cost, and therefore improve overall performance. At last thechoice of the correlation measure may influence the discriminative topolog-ical information extractable from the adjacency matrices as well. Measureslike Granger causality might be a little bit more powerful when consideringthe spatial dimensionality of the input vector of activity patterns as opposedto the usual temporal correlation.

17

References

[1] Burges, C., A Tutorial on Support Vector Machines for Pattern Recogni-tion, Data Mining and Knowledge discovery, Vol. 2, p. 121-167, 1998

[2] Brier, M.R., Thomas,J.B., Fagan, A.M.,Hassenstab, J., Holtzman,D.M.,Benzinger, T.L., Morris,J.C., Ances, B.M., Functional connectivity andgraph theory in preclinical Alzheiemr’s disease, Neurobiology of Aging,Vol. 35, p. 757-768, 2014

[3] Bullmore,E.T., Sporns, O., Complex brain networks: graph theoreti-cal analysis of structural and functional systems, Nature Review Neu-roscience, vol. 10(3), p. 186-198, 2009

[4] Jensen,J., Willeit, M., Zipursky,R.B., Savina, I., Smith, A.J., Menon,M., Crawley, A.P., Kapur, S.,The Formation of Abnormal Associationsin Schizophrenia: Neural and Behavioral Evidence, Neuropsychopharma-cology, vol. 33, p. 473-479, 2008

[5] Kuncheva, L., Rodrigeuz,J., Classifier ensembles for fMRI data analysis:an experiment, Magnetic Resonance Imaging, vol 28, p. 583-593, 2010

[6] Lia, H., Liang, Y., Xub, Q., Support vector machines and its applicationsin chemistry, Chemometrics and Intelligent Laboratory Systems, Vol. 95,Issue 2, p. 188-198, 2009

[7] Minati, L., Grisolic, M., Sethd,A.K., Critchley,H.D., Decision-making un-der risk: A graph-based network analysis using functional MRI, NeuroIm-age, vol. 60(4),p. 2191-2205, 2012

[8] Newmann, M.E., The structure and function of complex neworks, SIAMRev., vol. 45, p. 167-256, 2003

[9] Rubinov, M., Sporns, O., Complex network measures of brain connectiv-ity: Uses and interpretations, NeuroImage, vol. 52, p. 1059-1069, 2009

[10] Tijms, B.M., Wink, A.M., de Haan, W., van der Flier, W.M., Stam,C.J., Scheltens, P., Barkhof, F., Alzheimer’s disease: connecting findingfrom graph theoretical studies of brain networks, Neurobiology of Aging,Vol 34, p. 2023-2036, 2013

18

[11] Yan, X., Kelley, S., Goldberg, M., Biswal, B., Detecting overlapped func-tional clusters in resting state fMRI with Connected Iterative Scan: Agraph theory based clustering algorithm, Journal of Neuroscience Meth-ods, vol. 199, p. 108-118, 2011

[12] Zhang, X., Tokoglu, F., Negishi, M., Arora, J., Winstanley, S., Spencer,D.D., Constable, R.T., Social network theory applied to resting-statefMRI connectivity data in the identification of epilepsy networks withiterative feature selection, Journal of Neuroscience Methods, vol. 199, p.129-139, 2011

19

Documents

Classi cation of task related fMRI: a complex network analysis · 1 Introduction The study of task-free, or resting-state, fMRI data using graph theory has gained broad interest in