Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

Embed Size (px)

Citation preview

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    1/20

    MSC IN NEUROIMAGING STATE OF THE ART ESSAY

    Multivoxel Pattern Analysis (MVPA) in fMRI

    settings : Fundamentals & Case of study

    [Escriba el subttulo del documento]

    Mario B.Prez

    12/12/2013

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    2/20

    Multi-voxel Pattern Analysis (MVPA) in fMRI settings:

    Fundamentals & Case of study.

    by Mario B. Perez

    Introduction

    The rise of MVPA as analysis technique for fMRI BOLD data is yet to come. Authors like

    Haxby (2012) have pointed out the initial complexity and uniqueness of the MVPA perspective

    upon brain response, and the slow adaptation of the researcher community to this new way of

    thinking which involves knowledge from machine learning methods.

    Unlike univariate techniques, that normally address where a cognitive process is

    localized, MVPA (which has many synonyms; mutivariate pattern analysis, information-patternanalysis...) can give an additional answer on how it is coded. An additionally interesting feature

    is its ability to clarify the common situations in which two different processes overlap in their use

    of brain areas, sharing the same resources for divergent purposes (Peelen and Downing, 2006).

    The main aspect that makes MVPA a qualitative jump in fMRI BOLD processing is that it

    accounts for the interactions between individual voxels and, as its name announces, detects

    and refines this interactions in patterns of activation. This activation patterns can be aroused

    due to any given process in the brain, and can then labeled and recognised when they will

    appear again (Tong and Pratte, 2012). Although as we will see there are many possible flaws in

    this process, with this basis many impressive and eccentric applications have flourished

    gradually. Since the famous brain reading or brain decoding (Reddy et al,2010), to lyingdetection (Davatzikos et al., 2005) or even natural scenes (Nishimoto et al.,2011).

    Given the fact that the usage of MVPA normally entails the selection of regions of

    interest (ROIs), the visual system has been the main target of studies undertook up to date, due

    to its relatively well-known functional structure.

    While reviewing the literature about this topic, the pioneering work of Haxby (2001) upon

    visual category recognition and Kamitani and Tongs (2005) prediction study upon of grating

    orientation are quoted very often, and are considered as responsibles for the spreading of

    MVPA. The remarkable work of Kay et al., (2008) on image identification is also influential.

    Although it has been applied to other sources of data like EEG (Rosenberg et al., 2012)

    MVPA has been mainly applied to data from fMRI signal. Its novelty and distinction from other

    conventional ways of analysis, together with the promising perspectives of its use and strikingpreliminar applications make of MVPA the main topic of this essay. In the following sections we

    try to set the basis that underlies MVPA, take a glance to the most common flaws while using it,

    and finally reviewing a case of study in which this methodology was used.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    3/20

    Characterizing Multi-voxel Pattern Analysis

    Like with everything else, probably the best way of introducing a new technique is to

    compare it with the preexisting ones. Traditionally, when a researcher wants to know which

    areas are involved in a particular task or setting, the analysis of the fMRI data considers eachvoxel individually, although the signal is acquired at once from the whole brain (Haynes and

    Rees, 2006). Thus, the activity course of each voxel is conceptualized as unrelated from others,

    and its analysis is carried out without considering the possibility that other active or inactive

    voxels may be relevant in it. Other voxels behaviour may provide a sense for that voxel

    activation, and therefore contribute to draw the whole picture (Haxby, 2012). Although praised

    and recognised as extremely useful (Norman et al., 2006) This mass univariate analysis has

    shown its limitations (OToole et al., 2007) as there are limits to examine voxels in isolation.

    By contrast, MVPA performs a multivariate analysis that takes into account those

    relations and differences of activity between voxels that arise from complex stimulation settings

    Then, MVPA is not exclusively aimed to determine which voxels are active, but how the

    activation of different voxels is related, the so-called activity patterns. These activity patterns

    portray valuable information about how an stimuli or percept is coded, and provide to fMRI

    analysis an enhanced sensitivity to cognitive processes (Tong and Pratte, 2012). MVPA is

    dedicated to this pattern-recognition activity.

    This enhanced sensitivity is extracted by avoiding certain steps that take part in

    conventional fMRI analysis settings (e.g block designs), such as spatial smoothing and

    averaging to intensify the differences between experimental conditions (Norman et al., 2006)

    Standard studies try to show that the average activity during one condition is significantly

    different than other condition in all time points, and therefore the information about the brain

    activity in a specific time point is lost (Haynes and Rees, 2006). This is specially relevant in

    experiment that use complex stimulation , such as natural scenes, because averaging discardsthe fine-grained activity that among with a certain amount of noise might carry valuable

    information about how the stimuli is processed (Speirs and Maguire, 2007).Other processing

    elements such spatial smoothing are also responsible of this blurring effect making fine-

    grained activity unavailable (Mur et al.,2009). However, as Etzel et al. (2009) point out, spatial

    smoothing can be useful in between-subjects analysis, where MVPA has shown difficulties due

    to the great degree of specificity of the within-subject signal (Cox and Savoy,2003), that entails

    generalization issues.

    Haynes and Rees (2006) argued that this signal loss due to MVPA-unfriendly processing

    steps could account for important features, because voxels that show a weak or inconsistent

    response might do carry vital information when analyzed together and not separately. This

    exposes an interesting idea about how some cognitive processes could work, the weak choral

    activation of many voxels might be potentially as useful as a strong (significant) activation of a

    individual voxel.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    4/20

    As an example of this Kamitani and Tong (2005), in their famous study upon orientation

    detection, have revealed that many voxels show patterns of weak activation in a consistent

    basis across repetitions of the same condition, proportioning a reasonable basis to this

    argument. More importantly, the remarkable work about category recognition carried out by

    Haxby (2001) showed that weaker or submaximal voxels are representative of each category

    (in this case shoes or bottles). More importantly, this study showed that even if voxels thehighest consistent activation are removed, the categorial fingerprint can be identified above

    chance. This accuracy in recognition when category-key voxels were put away implies that

    areas showing lower levels of activation can be used to discriminate between two categories.

    Because these categories share some high-activity areas, overlapping of two functions could be

    resolved by selecting low level features (Peelen and Downing, 2006). Downing et al. (2005) also

    provide a relation of overlapping category areas in which MVPA could be useful. Although

    distinction between two activity patterns based on activation intensity is possible (Hanson et al.,

    2004) research on overlapping issues needs to carry on.

    MVPA has been also described as a major advance in information extraction from the

    fMRI signal (Norman et al., 2006) and a necessary tool to avoid data wasting from neuroimaging

    data, which is normally expensive and difficult to register (O'Toole, 2007). Thus, as we said

    previously pattern analysis does not use processing steps that rule out potentially crucial

    information.Instead of using those strategies, MVPA tries to make the most of that fine-

    grained 1activity by defining what is the activation pattern of a voxel ensemble in a given

    example . Examples are presentations of our stimuli that will provide activation patterns to our

    classifier algorithm. Once our classifier has been trained with several examples, it will be

    theoretically ready to recognise which example has been presented to him. In a sense, the

    classifier holds a weighted model of the activation pattern characteristic of an object category

    for example.

    All this information might be a little unclear while compressed in such a brief proceeding

    description, but in the next sections we will address what is a classifier and how the processtakes place.

    MVPA basic procedure

    The graph below summarizes the procedure for carrying out an MVPA experiment.

    There a few remarks that have to be addressed before getting into detail, specifically and

    foremost about training data, testing dataand feature selection. Preprocessing steps, as well as

    scanning details are not taken into account in this essay2.

    1*The exploitation of these fine-grained activity represents one of the central features of MVPA, as low

    level activity or activity that does not achieve significance might be lost if this fine-grained activity wouldbe disregarded.2Brief note on data preprocessing: As Etzel et al. (2009) indicate, many steps that are used in

    univariate analysis that take part as well in MVPA. Correction for motion and normalization are typicallyused, while voxel-wise detrending (to correct scanner drifting for example) might be controversial due tothe delicate nature of the data needed for MVPA. The will find a great review about how to undertake aclassification analysis upon fMRI data at the quoted article.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    5/20

    The overall majority of reviews and articles consulted on technical aspects of MVPA

    make particularly clear the necessity of splitting training data and testing data as the first step to

    be made (Pereira et al., 2009; Kriegeskorte et al., 2009; O'Toole et al., 2007; Mur et al.,2009).

    Also important is to not to use the testing data as part of the feature selection. The reasons that

    explain these precautions will be addressed later on, however, it must be clarified as the

    illustration fails to reflect this particular aspect evident.

    3

    3Illustration from Norman et al., 2006. It includes a fourth step (b or pattern assembly) that is rarely

    mentioned. It does not appear in other reviews as Mur et al.,2009, and involves the labelling of activitypatterns. Since patterns are caused by discrete stimuli and we have set that stimuli, there is no need tolabel the patterns. It may be pertinent in exploratory analysis, where the source of the pattern may beunknown. Believe it or not, this is the best illustration of the process available and its repeatedly used.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    6/20

    Overview

    The process outlined in the graph above includesthree main steps that will be described

    next (remember that data splitting has already took part):

    1. Feature selection: (or voxel selection): This first step tries to delimitate a set ofvoxels that will be used further. It can be done following different techniques. In

    this case , bottles and shoes were showed to select the pertinent voxels

    2. Classifier training: In these stage the training data is used to train our classifier

    algorithm, so it will establish a successful function between our examples (or

    stimuli) and its characteristic activity pattern. Once training has finished the

    classifier generates adecision plane.

    3. Classifier or generalization testing: The classifier is exposed to new data (testing

    data set) that belongs to the same category. In our example, it will be images of

    shoes and bottles not presented previously. The activation patterns will be

    submitted through the classifier that will assign them a position on the decision

    plane. Based on where they fall and the identity of the example, the

    classification was successful or not.

    Feature Selection

    Also called voxel selection, this is a capital step in MVPA, because it will define the

    framework and extent of the analysis. It is as well one of the steps that portrays many pitfalls, as

    we will see in the section for that purpose.

    First of all, Why is feature selection necessary?

    Many articles disregard this fundamental question that arises easily. The mere presence

    of a voxel selection appears looks to contradict the foundation of MVPA. If taking into account

    the interactions between voxels is the goal to achieve, to narrow down the amount of voxels that

    we are going to account for in our analysis seems nonsense. Nonetheless, as Cox and Savoy

    (2003) point out, many classifiers experience an inherent loss of accuracy when the number of

    voxels included into analysis is very high. While MVPAs power mainly resides in taking into

    account voxels which activity is not necessarily significant, adding irrelevant voxels whose

    activity mainly reflects noise or is very low affects significantly the performance of the classifier.

    In spite of all, methods and applications that allow the usage of whole-brain activity have been

    described at Tong and Pratte (2012).

    Normally, these whole-brain studies deal with high-level cognition processes, which are

    not easy to narrow down to a specific set of ROIs. Therefore, these researchers use

    independent component analysis (ICA) or Principal Component Analysis (PCA) to narrow down

    the number of dimensions.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    7/20

    In a simple way, is a method that allows reducing the number of variables to take care of

    by grouping them around linear solutions, that are unrelated as much as is possible.

    So,given the necessity to do so, feature selection will provide us with the voxels we are

    going to include in our analysis. First of all, it is very common to select a region of interest (ROI)

    relevant for our study in which our feature selection is going to take place (Haxby et al., 2001;Mur et al., 2009; Chadwick et al.,2010). Following Pereira et al.(2008), we can distinguish

    between filtering and wrapper feature selection methods. Wrapper methods carry out operations

    adding and subtracting voxels taking in consideration the impact they have in the classificators

    performance, however, these methods involve some combinatorial issues that make computing

    complicated (Norman et al., 2006) and filtering methods are normally preferred. Filtering

    involves creating a voxel ranking based on a specific criteria. We can then rank voxels based on

    how active are they are, how high is their discrimination power between conditions, their

    prediction accuracy, the consistency of their activation and so on.

    It is important nonetheless to realize that by doing filtering, we are considering voxels as

    separate identities again (so we are performing univariate analysis) . A popular option is to

    focus on voxels which show maximum activity and hold a good discriminant power (Polyn et. al,

    2005, as quoted in Norman et. al, 2006). With certain classifiers, a multivariate feature selection

    called Searchlight accuracy can be used (Kriegeskorte et al, 2006). This method tries to add

    the information from the voxels environment (neighbouring voxels) defining a spherical cluster

    which is a ball of voxels of x radius. The testing data is used repeatedly so the useful voxels

    can be detected within the radius of the spherical cluster

    Classifier Training

    This step involves to use those trials that we saved for training to supply examples of theactivation patterns (characteristic of our experimental conditions) to a multivariate algorithm.

    This algorithm will learn the statistically representative features of our stimuli, and will generate

    a decision function(1c in the previous graph) that will be used to make a call each time a stimuli

    will be presented in our testing phase.

    The algorithm choice is one of the decisive steps when it comes to MVPA..There is a

    great variety of classifiers available and a not-so-exhaustive discussion about them will take all

    the length of this essay. However, we cannot pursue with almost a slight discussion of this point.

    What is a classifier algorithm?

    The easiest way to introduce this notion is to describe the task that classifiers perform.

    Classifiers have to identify the relationship between voxels activity and the stimulus

    appearance, and being able to recognize that relationship with unpresented stimuli of the same

    category in the future.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    8/20

    Thus, classifiers obtain aparametric profileof the activity pattern elicited by the stimulus

    or example. This parameters are acquired during the training phase with the data reserved for

    that purpose. When the training is finished, the classifier is supposed to be able to give us a

    prediction (or identify, discriminate; that lies on the researcher's assumption 4) of which stimulus

    has been presented to a given subject. This prediction must be based on a different set of

    examples than the one used for training, if otherwise, there would exist a problem of overfitting(see section of limitations).

    Classifiers differ in the type of function they learn (Pereira et. al, 2008). Primarily,

    algorithms can be divided between linear andnon-linear ones. The overall majority of MVPA

    studies have used linear classifiers due to their success according to Mur et. al (2009).

    Additionally, non-linear classifiers have not consistently demonstrated a superior performance in

    any case to date according to Mur et al (2009) while the same authors consider the solutions

    offered by these classifiers as difficult to interpret. Sheng (2011) suggests that one of the key of

    linear classifiers is their simplicity and their ability to balance the influence of specific voxels

    between examples or stimulus. All linear classifiers will elaborate a weighted model, that will

    reflect the importance of the different voxel activity values . In the illustration below each voxel

    (represented by x) has assigned a specific weight (w). In a hypothetical situation, category

    shoe could be defined by xw>0 and class bottle could be defined by xw

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    9/20

    This line will be constructed with the feedback that the classifier is provided with as

    being trained with training data examples. Thus, given an exampleduring the testing phase , the

    xw model will be submitted to the decision function that has already been constructed

    throughout the training phase ,and that will work as our linear threshold to determine which

    category has been presented.

    6

    The illustration above give us a chance to explain part of the of the potentiality of MVPA.

    In the first situation (a) , we see how the two distributions are completely segregated in a rather

    simple way, when voxel X1 (lets say blue) and voxel X2(let's say red) have opposed

    activations. When condition A is presented, voxel X1 shows activation while X2 is inactive.In this situation the usage of univariate analysis would yield optimal results. There is no overlap

    between conditions.

    However, the situation at the right displays a more complex situation (b). It can be

    approached nonetheless by using MVPA with a linear classifier, that by assigning weights to

    each voxel will be able to code the influence of them. Then, given an specific point on the plane

    the decision threshold will allow us to determine what condition or example was more probable

    to have occurred.

    The last of the three situations in our illustration (c.) will be tackled only with the help of a

    non-linear classifier. The idea is the same as with linear ones, but in this case the decision

    threshold is more complex.

    Although non-linear classifiers might be more powerful, most of the texts are not very

    enthusiastic about their utilization in one way or another (Kamitani and Tong, 2005, Pereira et.

    al, 2008, O'Toole et. al, 2007, Norman et al., 2006 and others).

    6illustration from Cox and Savoy (2003)

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    10/20

    As we said, it is considered that this methods yield results very difficult to interpret, and

    that the gain in performance due to usage of non-linear classifiers is unclear.

    Classification by Nearest-neighbour

    This method is one of the simplest, as it does not imply the learning of a function

    properly speaking. The example presented is compared to the ones already seen in the training

    stage, so a decision is made based on the likeness between the training and the testing

    examples. There are ways that can improve the performance of nearest-neighbour by averaging

    the pattern left by the testing examples, but again this will remove variability that might be

    valuable for making a decision. According to Pereira (2008), nearest neighbour works well as

    long as the number of voxels remains relatively low. This classification system was used in

    Haxby (2001).

    Generalization Testing

    Up to this point, the last step is just to test the classifier by exposing it to new,

    unpresented data. The comparison between the presentation template and the predictions

    yielded by our classifier will yield an accuracy percentage.The classifier has therefore made a

    judgement in each case saying which of the conditions has been presented. If it achieves values

    beyond chance, training has been successful.

    Limitations and Pitfalls using MVPA

    Like every method, MVPA has several weaknesses, some of which are more avoidablethan others. Technical limitations due to spatial or temporal resolution are difficult to avoid

    (temporal resolution of MVPA is inevitably limited by the dispersion of the hemodynamic

    response Norman et al.,2006), whereas others like feature selection or classifier choice are

    likely to be controlled with the help of a good decision-making process.

    Capacity to deal with overlapping states

    For example, as we have seen one of the strengths of MVPA is to disentangle the

    activation patterns (spatial patterns) produced by two different stimuli or mental states that have

    take part (Cox and Savoy, 2003). By contrast, as Haynes and Rees (2006) point out there is

    currently no evidence supporting that MVPA could distinguish between two stimulus that

    happen at the same time and whose spatial representation share the same conjunct of neurons.

    It can be argued that this limitation might be solved with the appearance of higher spatial

    resolution, but as Haxby (2012) states, there is a necessary limit in the number of modules that

    only can support one kind of processing.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    11/20

    A limited number of categories for an unlimited world

    In a logical extension of this reasoning, Haynes and Rees (2006) claim that while

    percepts or stimulation ways are virtually infinite, the number of training categories has to be

    obviously limited. Hence, our classifier will be always limited to a certain number of

    discriminations. Attempts to work in this issue could came from studies that deal with thegeneralization problems of MVPA like the one carried out by Kay et al. (2008). The classification

    in this report shows remarkable generalization skills while exposed to numerous unpresented

    images reaching high prediction rates based in a training set of 1750 images.

    The presence of previous knowledge

    Although as we said to obtain whole-brain analysis is possible (Polyn et al.,2005, as

    quoted in Haynes and Rees, 2006) it certainly involves many challenges difficult to resolve

    (combinatorial limitations, overfitting.). A plausible alternative could be the usage of

    searchlight feature analysis, which is supposed to alleviate the potential computational issues

    (Tong and Pratte, 2012) . Thus, using MVPA implies to have a reasonable knowledge of the

    features to study and almost some guidance to know where to find them. As Pereira et al.

    (2008) mention, the definition of ROIs is a common step in the overwhelming majority of MVPA

    studies. This particular issue is supposed to have a lower impact in systems which functional

    architecture is relatively known (visual system according to Haynes and Rees, 2006) but stands

    as a remarkable issue with other cognitive functions whose functioning basis has not been

    described properly yet. Feature selection stands as one of the biggest causes of issues in

    MVPAs studies. According to Tong and Pratte (2012) studies on higher-level cognition have

    difficulties to define a coherent set of regions of interest, and to therefore to target correctly

    relevant voxel arrays.

    Generalization issues

    This a topic related at the same time with the strengths of MVPA. Pattern recognition

    involves the exploitation of the so-called fine grained activity (Norman et. al, 2006).

    Consequently, response patterns are highly characteristic and difficult to extrapolate to other

    subjects. The pattern aroused by the stimulus X in the subject 1 should be, in ideal situati on,

    the same as the one aroused by the same stimulus in subject 2. Currently and while some

    extrapolations has been successful (Haxby 2011 developed hyperalignment which includes

    tuning functions , as quoted in Haxby, 2012), this is an unresolved problem.

    Studies normally conduct MVPA analysis in a within-individual basis. However,

    generalization problems doesnt die there. As Haynes and Rees (2006) indicate, even more

    complicated is the generalization across different contexts. That is, in our dummy example,

    subjects 1 and 2 receive the same stimulation assuming a similar setting, but what would

    happen if the context surrounding that presentations would not be the same?.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    12/20

    While classification accuracy does not drop uncontrollably, a severe worsening even

    when the setting is the same but scans are carried out in different days (Cox and Savoy,2003).

    Generalization across different stimulus was nonetheless achieved in a working memory study

    (Harrison and Tong, 2009; as quoted in Mur et al, 2009) and between subjects in an auditory

    perception one (Formisano et al., 2008 as quoted in Tong and Pratte, 2012).Once again the

    study of Kay et al., (2008) stands as example, as it demonstrated successful generalizationacross time. Haynes and Rees (2006) point out that to improve generalization normally takes is

    cut out of the individual discriminatory power.

    Finally, an important aspect is to interpret carefully the mere differential activity. Poldrack

    (2009) found that when subjects carry out a cognitive task that varies enough almost all cortex

    can show discrimination power. Differential activity rates can take part due to a myriad of

    reasons like slight differences in memory processing load, difficulty, time to process or language

    requirements (Tong and Pratte, 2012).Future directions have a great deal in developing

    calibration and adjustment protocols between subjects and situations.

    Circular Analysis or Overf i t t ing

    The danger of circular analysis is a common cause of concern in MVPA studies.

    Reviewing the recent literature, it is evident that most of researchers are aware of this issue in

    almost its simpler form, which we explain now.

    Most of MVPA studies have as their main goal to train a classifier able to identify a

    specific activity pattern and distinguish it from others of the same or other categories (examples

    of this are Haxby, 2001;Kay et al., 2008;Chadwick et al.,;2010).To achieve this, we have stated

    that separated training and test data have to exist, the first one to train the classifier and the

    later to test the classifiers performance in terms of correct guesses proportion. The

    independence of these two data sets is understood as crucial, and they must be splitted in two

    before proceeding to the feature selection phase, as the first step of all process (Pereira et al.,2009)

    OToole et al., (2009) explains the reason why the training data cannot be used to test

    the classifier. In fMRI studies, we have normally a large number of parameters (that are the

    voxels we take into our analysis) compared to the number of examples (stimulus presentations)

    presentations. Thus, that the voxels contained in our ROIs outnumber largely our number of

    examples is relevant because the number of parameters that will characterize each activation

    pattern will be enormous, and some of these parameters will contain noise (probable systematic

    and unsystematic sources of error).

    This overfitting or large parameter characterization leads to a situation in which a

    perfect classification for the training test is possible. By contrast, the same classifier will obtain

    poor results while tested with new data due to the same reason.

    Overfitting leads to lower accuracy on the test set and therefore to lower generalization

    skills for the classifier (Mur et al,2009)

    Overfitting can lead to classifier overestimation as well if the training data will be used to

    assess the classifiers accuracy during the testing stage (Kriegeskorte et al., 2009). In this

    situation, the classifier built with many parameters can fit and identify a significant part of the

    testing material, regardless of the algorithm skills to classify other stimuli of the same category.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    13/20

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    14/20

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    15/20

    Multivariate Classification Procedure

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    16/20

    The above graph summarizes the multivariate classification procedure. In the illustration

    only two representations videos are shown for the sake of simplicity as the authors said.

    (taken from Chadwick et al., 2010, Supplementary Information). The image A shows two

    image captures from two of the videos (each video of 7 sec.), B shows an stimuli template. It is

    important to bear in mind that the stimuli are not the videos themselves, but the recall of them.

    Thus, ABBBAA.. implies, video A evocation, video B evocation, and so on. C describes theprocess of feature selection using the searchlight multivariate method for each ROI (it was

    described in feature selection methods of this essay). The data was splitted between training set

    and testing set, and only the testing set was used for voxel selection. They used a k-fold cross-

    validation strategy, which involves the selection of new features by searchlight feature selection

    each time (in each fold one example is left for testing and the rest are used), as authors say,

    with different training data.D once the voxel selection is completed, the linear SVM classifier is

    feeded with the examples to afterwards be tested using the example saved for testing purposes.

    Finally, in E the test data (all the examples are used as test data almost once according to the

    k-feature testing regime) which will be used to determine the classificators accuracy.

    Predictions are then compared with the real video presentation to establish the accuracy

    percentages.

    Criticism and Remarks

    If I may, let me first point out that as far as I know this is the first critic that is done upon

    this article. I wanted to try to do my best for applying the knowledge that I have obtained while

    writing this essay, that I have enjoyed doing as much as I have suffered. Thus, the attempt is to

    go a bit further than the mere introduction of the technique, but also to take a glance of how its

    applied out of the strictly machine-learning environment. Finally, I shall apologize for the

    reckless criticisms that I might be doing.

    Multivariate Classification Procedure & Claims of functional differentiation

    This experiment is particularly difficult to picture and it is important to keep a set of

    assumptions in mind. Firstly, examples used to test and train the classifier are activity patterns

    which come from the recalling of the three videos.

    Therefore, when the data set was splitted in testing and training data in each k-fold , it

    means that activity patterns from the three videos were separated in testing data (only one

    activity pattern) and training data (all the rest). While all of the activity patterns can be regarded

    as different because are evoked in different times and due to the reconstructional nature that

    characterizes memory recall. This characteristic of the memory recall makes the training data

    (that in the illustration above showed only 9 examples and we could therefore think about 14

    with three videos) clearly insufficient due to the highly similarity between videos.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    17/20

    The limited number of examples per stimuli together with the similarity between them

    probably made the linear classifier to acquire a few amount of overlapping strong parameters,

    while adding a great number of voxels whose activity was barely consistent each time. That

    barely consistent activity contained the fine aspects that could have distinguish between the

    three stimuli, and that may have helped to raise the classification accuracy rating by periodically

    saturating a series of parameters. It seems illogical why the authors selected three memoriessuch similar between them if their intention was to distinguish memory traits (all of them with a

    different woman that performs a similar action each time and walks away).For all what was

    exposed, a singular variety of overfitting took part when accuracy ratings barely surpassed

    chance levels.

    Possible additional reasons for this relatively poor performance is the unequal number of

    examples. When one example is reserved for testing, the other two have one example extra.

    While this might look trivial, Pereira et al., (2008) point out that the classifier can tend to prime

    the category with more examples, and therefore tend to predict it more frequently.

    Our second issue concerns the following statement Our data provide further evidence

    for functional differentiation within the medial temporal lobe, in that we show the hippocampus

    contains significantly more episodic information than adjacent structures . The authors claim

    functional attachments to the classification percentages, that show a slight better performance

    in the hippocampal area. There is no doubt that when memories are evoked some neurons in

    the hippocampus show activation. Even going further, there is a weak evidence that they can

    discriminate between memories when a trained classifier is used, but there is no evidence which

    supports that those neurons carry episodic information. This is as an example of reverse

    inference (Poldrack, 2006).

    Finally, a similar and more suggestive setting could have been the usage of a classifier

    to try to distinguish between unpresented memories. Such experiment would have entailed the

    memorization of the examples previously to feature selection, as they would be used only fortesting purposes. Feature and training data could be similar to ensure that the voxel selection

    will contain pertinent voxels.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    18/20

    References

    Chadwick, M. J., Hassabis, D., Weiskopf, N., & Maguire, E. A. (2010). Decoding

    individual episodic memory traces in the human hippocampus.Current Biology, 20(6),

    544-547.

    Cox, D. D., & Savoy, R. L. (2003). Functional magnetic resonance imaging (fMRI)brain

    reading: detecting and classifying distributed patterns of fMRI activity in human visual

    cortex. Neuroimage, 19(2), 261-270.

    Davatzikos, C., Ruparel, K., Fan, Y., Shen, D. G., Acharyya, M., Loughead, J. W., ... &

    Langleben, D. D. (2005). Classifying spatial patterns of brain activity with machine

    learning methods: application to lie detection. Neuroimage, 28(3), 663-668.

    Downing, P. E., Wiggett, A. J., & Peelen, M. V. (2007). Functional magnetic resonance

    imaging investigation of overlapping lateral occipitotemporal activations using multi-voxel

    pattern analysis. The Journal of neuroscience,27(1), 226-233.

    Downing, P. E., Chan, A. Y., Peelen, M. V., Dodds, C. M., & Kanwisher, N. (2006).

    Domain specificity in visual cortex. Cerebral cortex, 16(10), 1453-1461.

    Etzel, J. A., Gazzola, V., & Keysers, C. (2009). An introduction to anatomical ROI-based

    fMRI classification analysis. Brain Research, 1282, 114-125.

    Hanson, S. J., Matsuka, T., & Haxby, J. V. (2004). Combinatorial codes in ventral

    temporal lobe for object recognition: Haxby (2001) revisited: is there a face

    area?. Neuroimage, 23(1), 156-166.

    Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001).

    Distributed and overlapping representations of faces and objects in ventral temporal

    cortex. Science, 293(5539), 2425-2430

    Haxby, J. V. (2012). Multivariate pattern analysis of fMRI: The early

    beginnings.NeuroImage, 62(2), 852-855.

    Haynes, J. D., & Rees, G. (2006). Decoding mental states from brain activity in humans.

    Nature Reviews Neuroscience, 7(7), 523-534.

    Kamitani, Y., & Tong, F. (2005). Decoding the visual and subjective contents of the

    human brain. Nature neuroscience, 8(5), 679-685.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    19/20

    Kay, K. N., Naselaris, T., Prenger, R. J., & Gallant, J. L. (2008). Identifying natural

    images from human brain activity. Nature, 452(7185), 352-355.

    Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S., & Baker, C. I. (2009). Circularanalysis in systems neuroscience: the dangers of double dipping.Nature neuroscience,

    12(5), 535-540.

    Kriegeskorte, N., Goebel, R., & Bandettini, P. (2006). Information-based functional brain

    mapping. Proceedings of the National Academy of Sciences of the United States of

    America, 103(10), 3863-3868.

    Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading:

    multi-voxel pattern analysis of fMRI data. Trends in cognitive sciences, 10(9), 424-430.

    Nishimoto, S., Vu, A. T., Naselaris, T., Benjamini, Y., Yu, B., & Gallant, J. L. (2011).

    Reconstructing visual experiences from brain activity evoked by natural movies. Current

    Biology, 21(19), 1641-1646.

    Mur, M., Bandettini, P. A., & Kriegeskorte, N. (2009). Revealing representational content

    with pattern-information fMRIan introductory guide.Social cognitive and affective

    neuroscience, 4(1), 101-109

    Mitchell, T. M. (2008, January). Computational models of neural representations in the

    human brain. In Discovery Science(pp. 26-27). Springer Berlin Heidelberg.

    O'Toole, A. J., Jiang, F., Abdi, H., Pnard, N., Dunlop, J. P., & Parent, M. A. (2007).

    Theoretical, statistical, and practical perspectives on pattern-based classification

    approaches to the analysis of functional neuroimaging data.Journal of cognitive

    neuroscience, 19(11), 1735-1752.

    Pereira, F., Mitchell, T., & Botvinick, M. (2009). Machine learning classifiers and fMRI: a

    tutorial overview. Neuroimage, 45(1), S199-S209

    Reddy, L., Tsuchiya, N., & Serre, T. (2010). Reading the mind's eye: decoding category

    information during mental imagery. Neuroimage, 50(2), 818-825.

    Spiers, H. J., & Maguire, E. A. (2007). Decoding human brain activity during real-world

    experiences. Trends in cognitive sciences, 11(8), 356-365.

    Poldrack, R. A., Halchenko, Y. O., & Hanson, S. J. (2009). Decoding the large-scale

    structure of brain function by classifying mental states across individuals. Psychological

    Science, 20(11), 1364-1372.

  • 8/12/2019 Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study

    20/20

    Polyn, S. M., Kragel, J. E., Morton, N. W., McCluey, J. D., & Cohen, Z. D. (2012). The

    neural dynamics of task context in free recall. Neuropsychologia,50(4), 447-457.

    Rosenberg, M., List, A., Sherman, A., Grabowecky, M., Suzuki, S., & Esterman, M.

    (2012). Decoding EEG data reveals dynamic spatiotemporal patterns in perceptualprocessing. Journal of Vision, 12(9), 1173-1173.

    Sheng, L. I. (2011). Multivariate pattern analysis in functional brain imaging.Acta

    Physiologica Sinica, 63(5), 472-476.

    Tong, F., & Pratte, M. S. (2012). Decoding patterns of human brain activity.Annual

    review of psychology, 63, 483-509.