18
The animacy continuum in the human ventral vision pathway Long Sha 1,2 , James V. Haxby 1,3 , Herv´ e Abdi 4 , J. Swaroop Guntupalli 1 , Nikolaas N. Oosterhof 1,3 , Yaroslav O. Halchenko 1 , and Andrew C. Connolly *1 1 Dartmouth College, Hanover, New Hampshire, USA 2 New York University, New York, NY, USA 4 The University of Texas, Dallas, Texas, USA 3 University of Trento, Trento, Italy To appear in Journal of Cognitive Neuroscience Abstract Major theories for explaining the organization of semantic memory in the human brain are premised on the often observed dichotomous dissociation between living and non-living objects. Evidence from neuroimaging has been interpreted to suggest that this distinction is reflected in the functional topography of the ventral vision pathway as lateral-to-medial activation gradients. Recently, we observed that similar activation gradients also reflect differences among living stimuli consistent with the semantic dimension of graded animacy. Here, we address whether the salient dichotomous distinction between living and non-living objects is actually reflected in observable measured brain activity, or whether previous observations of a dichotomous dissociation were the illusory result of stimulus sampling biases. Using fMRI, we measured * Corresponding author, 6207 Moore Hall, Hanover, NH [email protected] neural responses while subjects viewed 10 an- imal species with high to low animacy and 2 inanimate categories. Representational similarity analysis of the activity in ventral vision cortex revealed a main axis of variation with high-animacy species maximally different from artifacts and with the least animate species closest to artifacts. While the associated functional topography mirrored activation gradients observed for animate-inanimate contrasts, we found no evidence for a dichotomous dissociation. We conclude that a central organizing principle of human object vision corresponds to the graded psychological property of animacy with no clear distinction between living and non-living stimuli. The lack of evidence for a dichotomous dis- sociation in the measured brain activity challenges theories based on this premise. 1

The animacy continuum in the human ventral vision pathway

Embed Size (px)

Citation preview

Page 1: The animacy continuum in the human ventral vision pathway

The animacy continuum in the human ventralvision pathway

Long Sha1,2, James V. Haxby1,3, Herve Abdi4, J. Swaroop Guntupalli1,Nikolaas N. Oosterhof1,3, Yaroslav O. Halchenko1, and Andrew C. Connolly∗1

1Dartmouth College, Hanover, New Hampshire, USA2New York University, New York, NY, USA

4The University of Texas, Dallas, Texas, USA3University of Trento, Trento, Italy

To appear in Journal of Cognitive Neuroscience

Abstract

� Major theories for explaining the organization ofsemantic memory in the human brain are premisedon the often observed dichotomous dissociationbetween living and non-living objects. Evidencefrom neuroimaging has been interpreted to suggestthat this distinction is reflected in the functionaltopography of the ventral vision pathway aslateral-to-medial activation gradients. Recently,we observed that similar activation gradients alsoreflect differences among living stimuli consistentwith the semantic dimension of graded animacy.Here, we address whether the salient dichotomousdistinction between living and non-living objectsis actually reflected in observable measured brainactivity, or whether previous observations of adichotomous dissociation were the illusory result ofstimulus sampling biases. Using fMRI, we measured

∗Corresponding author, 6207 Moore Hall, Hanover, [email protected]

neural responses while subjects viewed 10 an-imal species with high to low animacy and 2inanimate categories. Representational similarityanalysis of the activity in ventral vision cortexrevealed a main axis of variation with high-animacyspecies maximally different from artifacts and withthe least animate species closest to artifacts. Whilethe associated functional topography mirroredactivation gradients observed for animate-inanimatecontrasts, we found no evidence for a dichotomousdissociation. We conclude that a central organizingprinciple of human object vision corresponds tothe graded psychological property of animacy withno clear distinction between living and non-livingstimuli. The lack of evidence for a dichotomous dis-sociation in the measured brain activity challengestheories based on this premise. �

1

Page 2: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

INTRODUCTION

Evidence for the existence of an animate-inanimatedivision in the human ventral vision pathway hasbeen well documented in neuropsychology (Warring-ton & Shallice, 1984), electrophysiology (Kiani etal., 2007), and neuroimaging (Hanson et al., 2004;OToole et al., 2005; Kriegeskorte et al., 2008b). Thisdivision is associated with distinctive brain activ-ity topographies in ventral temporal (VT) cortex,lateral occipital (LO) cortex, and superior-temporalsulcus (STS) (Chao et al., 1999; Mahon et al., 2009),with lateral VT, superior LO, and STS showing moreactivation for animate objects, and medial VT andinferior LO showing more activation for inanimateobjects.

Several theories have been proposed to account forthis organization. Modality-specific accounts such asthe sensory-functional hypothesis (SFH) (Warring-ton & Shallice, 1984; Allport, 1985), propose that itarises from an interaction between 1) the uneven dis-tribution of sensory and motor properties central todifferent category domains and 2) neural represen-tations that are distributed across property-specificcortical fields. In contrast, the domain-specificityhypothesis (DSH; Caramazza & Shelton, 1998; Cara-mazza & Mahon, 2003) proposes that evolution-ary forces requiring efficient processing of importantcategory domains led to the development of men-tal modules that are dedicated for representing spe-cific category domains (see also Kanwisher & Dilks,2013). Some (Martin, 2007; Mahon & Caramazza,2009) have proposed a hybrid model that incorpo-rates both domain-specific and property-based rep-resentations. And others have proposed that thisorganization arises from differences in retinal eccen-tricity biases across categories (Levy et al., 2001).

While the animate-inanimate distinction is a ma-jor feature of the representational space in the ven-tral vision pathway, it is not the only one. Multi-variate pattern (MVP) analyses of functional neu-roimaging data have indicated that a broad rangeof object categories evoke specific distributed pat-terns of neural activity (Haxby et al., 2001; Hansonet al., 2004; OToole et al., 2005; Eger et al., 2008;Kriegeskorte et al., 2008b; Connolly et al., 2012b),and that these patterns have a common basis acrossindividuals (Haxby et al., 2011). Within the animatedomain, our previous work (Connolly et al., 2012b)showed that patterns of response in lateral occipital

and ventral temporal cortex could reliably differen-tiate animal species, including coarse distinctions,such as primates versus birds, and fine distinctions,such as monkeys versus lemurs. More surprisingly,using multidimensional scaling, we found that thedominant dimension was a continuum, which we in-terpreted as spanning the abstract psychological di-mension of animacy from most animate (primates)to least animate (insects). This continuum was asso-ciated with a cortical topography similar to the to-pography described for the animate versus inanimatedistinction, with patterns of response for “less ani-mate” species similar to patterns for inanimate cate-gories from other studies (Chao et al., 1999; Mahonet al., 2009; Martin, 2007). This pattern suggeststhat the animate side of the animate-inanimate di-vision is really a continuous dimension and that the“least animate” animals may be close to the repre-sentation of bona fide inanimates.

Here, we test the hypothesis that the neural rep-resentational space in the ventral pathway is domi-nated by a continuous dimension from most animateto least animate to inanimate categories. Our ques-tion is whether the inanimate stimuli have distinctrepresentations beyond the end of this continuum ordo they have representations similar to the animalspecies with “low animacy”. We define level of ani-macy as the degree of similarity to the animate pro-totype, which we assume to be humans. Thus, ani-mals that are phylogenetically close to humans, suchas monkeys, rank higher on the animacy scale thanmore distant relatives such as fish. We do not expectan exact match based on the amount of shared ge-netic material, however, because the relevant proper-ties of animate entities such as perceived intelligenceand sociability do not always predict genetic simi-larity. For example, creatures like dogs may rankhigher than some less familiar primate cousins—aneffect we observed in a previous study in which dogfaces were closer to human faces than were monkeyfaces in the neural representational space (Connollyet al., 2012a).

The notion of graded animacy is not new. Theclassical Greek philosopher, Aristotle, described ananimacy hierarchy in his treatise On the Soul (Latin:De Anima; Aristotle, 1986) in which he placed plantsat the very bottom, followed by the simple animalslike worms and proceeding through the higher ani-mals with humans near the top—exceeded only bythe gods. Ranking of animals and objects according

2

Page 3: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

Figure 1. Experimental design, main behavioral and neural findings. a) We tested 2 inanimate—keys andhammers—and 10 animate categories that spanned the range of animacy from humans and chimpanzees toladybugs and lobsters. Tiled stimulus arrays were used to mitigate any inuence of category-specific biasesin the retinal image. b) Left: The dissimilarity matrix derived from behavioral judgments shows a distinctseparation of animate and inanimate objects with a secondary axis reecting animacy among the animates.Right: classical multidimensional scaling (MDS) of behavioral DSM. c) Left: Neural similarity structurederived from activity in LO cortex shows no evidence of a clear animate-inanimate division. Right: MDSderived using STATIS. δ = dissimilarity; λ = eigenvalue of each PC; τ = inertia for each PC expressedas percentage of variance accounted for. Ellipses represent 95% confidence intervals for the factor scoresbased on bootstrapped data resampling.

3

Page 4: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

to the abstract notion of animacy is a well-knownphenomenon in linguistics in which animacy hierar-chies constrain the rules of grammar (Young & Mor-gan, 1987). The operationalization of animacy inthis study is based on our own intuitive rankings ofanimals along this continuum, which we presume tobe in line with both common intuition and cross-cultural folk taxonomies (Atran et al., 1998).

Our goals for the present study were: (1) toreplicate our previous findings an animacy con-tinuum in object vision cortex (Connolly et al.2012), and (2) to determine whether a sharp bound-ary exists between the “least animate” living stim-uli and the inanimate objects. Our stimuli wereimages of animal species with varying levels ofanimacy—primates, quadruped mammals, birds,fish, invertebrates—and of two inanimate categoriesof tools (Fig. 1a). We measured the patterns of neu-ral response to these stimuli with fMRI, and analyzethe results using a variety of MVPA techniques tocharacterize the high-dimensional neural representa-tional space.

METHODS

Stimuli

All image stimuli were collected online from thefollowing twelve categories: 1) humans, 2) chim-panzees, 3) cats, 4) giraffes, 5) pelicans, 6) warblers,7) clownfish, 8) stingrays, 9) ladybugs, 10) lobsters,11) hammers, and 12) keys (see Fig. 1a). Thesecategories could be ranked and grouped based ona priori level of animacy from highest to lowest asprimates, quadruped mammals, birds, fish, inverte-brates, and tools. Within each level of animacy, wealso tried to control for real world size, so that atypically large object is paired with a small objectin each level. This was done to eliminate a potentialconfound between levels of animacy and real worldsize (Konkle & Oliva, 2012). To minimize the effectof low-level image statistics on our early visual areasignals we first tiled the images then randomized thesize of the individual images for each row (Fig. 1a).There were 20 different images within each category.One tiled image was created based on the same im-age, and one based on the left-right mirrored version.There were 40 images for each category, resulting in480 tiled images in total for the experiment. Eachpresentation was a randomly selected image from theimage pool. Each visual stimulus was presented at

about 13◦×13◦ visual angles to subjects in the fMRIscanner.

Behavioral judgments

For each stimulus category in the experiment, wecollected behavioral judgments using Amazon’s Me-chanical Turk (https://www.mturk.com). Fortyanonymous raters were recruited for each test.Raters saw three images from different categorieslined up on the same screen, and were required tochoose the odd one out based on general object cat-egory. Each triplet (out of 220 possible ones) wasrated by four different individuals. We scored theresults by creating a stimulus category by stimu-lus category dissimilarity matrix and for each imagejudged as odd we added 1 to the two cells corre-sponding to the dissimilarity between the odd imageand the other two images.

fMRI Participants

Eleven adult participants (7 males) from DartmouthCollege community participated in the experiment.All subjects provided informed consent and werecleared for safety in MRI scanning before the ex-periment. Subjects were paid for their time. Allexperimental procedures and consent materials wereapproved by the Institutional Review Board of Dart-mouth College, the Committee for the Protection ofHuman Subjects.

Procedure

For each trial, three images were shown consecu-tively to the subjects. Each image was presentedfor 500 ms without gaps between images, and eventswere followed by a 4500 ms inter-stimulus interval.During image presentation, a fixation cross appearedat the center of the screen. Subjects were instructedto pay attention to the images and were free to movetheir eyes, while performing a simple one-back taskfor all image triplets. If the last image in the tripletsbelonged to the same category as the previous two,subjects were asked to report “same” by pushing theresponse button in the scanner, otherwise to report“different”. The mismatch trials and blank trials—where only a fixation cross appeared for 6 s—werepseudo-randomly placed in the experiment for sub-jects to maintain attention. There were 7 functionalacquisition runs. Within each run, Type-1, Index-1

4

Page 5: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

sequence for first-order counterbalancing of 14 trialtypes (12 categories, 1 mismatch trial, and 1 blanktrial) was created for run-wise presentations (Finney& Outhwaite, 1956; Aguirre, 2007). Each trial typewas repeated 6 times. In addition, 3 leading dummytrials and 1 trailing dummy trial were added, result-ing in 88 trials per run.

Image Acquisition

Brain images were acquired using a 3-Tesla PhilipsAchieva Intera scanner with a 32-channel head coil.The functional imaging used gradient-echo echo pla-nar imaging with SENSE reduction factor of 2. TheMR parameters were TE/TR = 35ms/2000ms, ipangle = 90◦, resolution = 3 × 3mm, matrix size of80 × 80, and FOV = 240 × 240mm. There were42 transverse slices with full-brain coverage, and theslice thickness was 3mm with no gap. Slices wereacquired in an interleaved order. Each of the 7 func-tional acquisition runs included 264 functional acqui-sitions and 12 dummy acquisitions for a total timeof 552 s per run. At the end of each experimenta single, high-resolution T1-weighted (TE/TR =4.53ms/9848ms) anatomical scan was acquired witha 3D-turbo field echo sequence. The voxel resolutionwas 0.938×0.938×1.0mm with a bounding box ma-trix of 256×256×160 (FOV = 240×240×160mm).

Image Preprocessing

In order to reduce noise related to subjects headmovements and scanner drift, functional images werepreprocessed using the following steps: First, im-ages were corrected for slice acquisition time dueto interleaved slice order within each functional ac-quisition. Second, subjects head movements werecorrected by spatially registering all functional im-ages to the last functional acquisition, which wasclosest in time to the anatomical high resolutionscan. Third, data were despiked to remove sig-nals unlikely to be physiological activities. Fourth,each functional acquisition run was detrended usingup to 3rd order polynomials to remove any scan-ner drift, and motion parameters were also regressedout. Stimulus-specific Blood-oxygen-level dependent(BOLD) signal response patterns were estimated us-ing the general linear model for each run separately,resulting in 12 full-brain beta patterns for each ofseven runs for each subject. The functional datawere not spatially smoothed. All preprocessing steps

were performed using AFNI (Cox, 1996).

Surface mapping

Brain surfaces for each participant were generatedusing FreeSurfers recon-all software (Fischl, 2012).Surfaces were transformed into AFNI/SUMA for-mat and resampled to a standard surface mesh gridat various resolutions. The preparation of the sur-faces in this manner was facilitated using a wrapperscript (mvpa2-prep-afni-surf), which itself is freelyavailable through PyMVPA. The surfaces were cre-ated from the T1-weighted anatomical images. Themapping of functional data and results from vol-ume space to surface space was done using AFNI’s3dVol2Surf program. The mapping of data from sur-face to volume was done using AFNIs 3dSurf2Volprogram.

Pattern Classification

We used linear support vector machines (SVM;Cortes & Vapnik, 1995; Chang & Lin, 2011, im-plemented in PyMVPA, Hanke et al., 2009) for allclassification analyses with leave-one-run-out crossvalidation strategy.

Whole brain surface-based searchlight

Conventional whole brain searchlight analysis as-signs a desired computation to a spherical local re-gion of interest (ROI) centered on a given voxel, anditerates the same computation through all the voxelsin the brain volume (Kriegeskorte et al., 2006), map-ping the results of some analysis back to the centervoxel. While this method is excellent for mapping in-formational content throughout the brain volume, itruns the risk of including voxels in the same search-light that are not actually near to each other on thecortical manifold, for example, voxels on either sideof the sylvian fissure. Instead of a volume-basedsearchlight, here we use a surface-based searchlight(Oosterhof et al., 2010, 2011) which defines each lo-cal ROI centered on a surface node from a standardsurface mesh (Fischl, 2012). For each surface nodethe algorithm identifies the set of neighboring surfacenodes and selects their corresponding voxels from thesubjects native brain volume to define the search-lights patterns. The results of the multivariate cal-culation are then mapped back to the correspondingcentral surface node. This method takes advantage

5

Page 6: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

of the surface anatomical alignment across subjectsand defines more restricted searchlight ROIs con-taining only voxels that are adjacent on the corticalmanifold, as distances between nodes on the surfacesare computed along the cortical surface rather thanusing a Euclidean measure as used in volume-basedapproaches. Searchlight volumes were restricted toinclude 100 neighboring voxels per central surfacenode. For the definition of searchlight centers, weused a standard surface mesh with a resolution cor-responding to a linear icosahedron tessellation with32 linear divisions of each of the twenty triangles of astandard icosahedron (see AFNI’s MapIcosahedron).This resolution results in 20,484 surface nodes foreach full-brain surface reconstruction, which is closeto the resolution provided by 3 mm isotropic vox-els with about 2.5 mm between node pairs. For thedefinition of the surface nodes that make up a localneighborhood, we used a higher resolution standardmesh with an icosahedron tessellation with 128 lin-ear divisions with 327,684 nodes per full brain sur-face reconstruction. Searchlight analyses were im-plemented using PyMVPA (Hanke et al., 2009).

For the first searchlight analysis, we computed thecorrelation between the behavioral dissimilarity ma-trix and individual subjects voxel dissimilarity ma-trix within each local searchlight, and we iteratedthis process for the whole brain surface. We thenperformed one-sample t-test against zero for eachsurface node across all subjects to generate a groupmap (Fig. 2a).

For the second searchlight analysis (ROI defini-tion), we performed 12-way classification using SVMclassifiers with a leave-one-run-out cross validationstrategy within each local searchlight, and we iter-ated this process for the whole brain surface. Wethen performed a one-sample t-test against chance(.0833) for each surface node across all 11 subjects,and constructed a group mask by thresholding thet-statistic map at t(10)4 (p = .003, two-tailed, un-corrected for multiple comparisons). We then usedthe SUMA clustering program (SurfClust) to isolatea continuous group mask, with threshold of t > 4and a size criterion of 100 contiguous surface nodes.The ROI in each subject included the late visual ar-eas (LO and VT), similar to our previous late visualmask (Connolly et al., 2012b). The surface maskswere then registered to the brain volume. Becausethe mapping from the standard surface mesh to in-dividual volumes is idiosyncratic, each subjects vol-

ume mask contained a unique number of voxels witha mean of 2395 ± 88.

Feature Scaling

Often MVP analyses incorporate a voxel-selectionstep, which involves using some criterion to selectthe most informative voxels while discarding noisyones. Voxel selection is an important step for reduc-ing noise, which helps in assessing the true informa-tional content of a dataset. However this practicehas some drawbacks. For example, it is necessarywhen selecting voxels to choose a threshold crite-rion, either keeping a set percentage or a set numberof features. Also, different data folds used for cross-validation will likely select different sets of features,and selected features in one subject are not guaran-teed to overlap with selected features in other sub-jects. These latter facts make it difficult to assessthe topographical consistency of voxels that carryinformation. In order to avoid the arbitrary natureof threshold setting and to keep all of the voxelsin our analysis, while at the same time attenuat-ing noise, we chose voxel scaling as an alternative tovoxel-selection. For every data fold used for cross-validation, the training data were used to computean F -statistic based on a one-way analysis of vari-ance, which is a standard sensitivity measure oftenused in voxel-selection. Then instead of discardingvoxels with an F value below some criterion, we mul-tiplied each voxel by its F -value (after normalizingthe set of F -values to make their sum of squaresequal to one). Conceptually, feature scaling and fea-ture selection are equivalent: Feature selection canbe thought of as a special case of feature scalingwhere the weights for voxels below the criterion areset to zero, while the weights for those above areset to one. For classification it is important whenselecting voxels to base the selection criterion onlyon the training data to avoid over-fitting and arti-ficially boosting classification accuracies (Kriegesko-rte et al., 2009). This is also true for voxel scaling,which works the same as voxel-selection for cross-validation purposes. A unique scaling F-map is gen-erated for each data fold and this map is based onlyon the training data.

STATIS

STATIS is an extension of principal component anal-ysis (PCA) used to analyze three-way data and has

6

Page 7: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

been used previously in neuroimaging (O’Toole etal., 2007; Kherif et al., 2003; Shinkareva et al., 2006;Abdi et al., 2009). STATIS optimally combines datatables from different sources then performs PCA onthe combined data-table, which is called the compro-mise. STATIS provides a low-dimensional character-ization of the informational structure, which entailssimply plotting the factor scores for the observations.The variable (voxel) loadings describe the contribu-tions of each variable to each principal component.In addition, bootstrap data resampling provides forthe estimation of confidence intervals for the factorscores of the stimuli. These confidence intervals areindicated in our plots using ellipses around the factorscore centers. STATIS is well suited for analyzingfMRI data because the only requirement for com-bining individual data tables is that they have thesame number of rows (i.e., observations or stimuli).The columns (alternatively: variables, features, vox-els) need not correspond across data tables. This isso because the individual tables may be combinedhorizontally—i.e., a table with i rows and j columnsis combined with a table with i rows and k columnsto form a table with i rows and j+k columns. Thusvoxels from different subjects may be combined inthe same compromise table without aligning brainsto a standard template. Because STATIS has beenthoroughly written about elsewhere, we refer thereader to Abdi et al. (2012) for a tutorial and de-tailed mathematical definitions.

We used STATIS at several stages in our analy-ses. First, we used STATIS to optimally combinethe seven runs from each individual subject into asingle 12 × n data table, where 12 is the number ofstimulus categories and n is the size of subject’s ROI.This is the first step of STATIS without performingPCA on the compromise tables. Before computingthe subject-specific run-wise compromise tables, thecolumns (voxels) of each subjects dataset were scaledusing the feature scaling method described above,then centered by subtracting the column-wise meansfrom the values in each column. The rows were cen-tered in a similar manner. Finally, before generat-ing the compromise, each row was scaled by its Eu-clidean norm (square-root of sum-of-squares). Wechose to center and scale the rows in this manner be-cause this transformation is involved in calculatingPearson correlation distance between observations,which is a standard way to calculate neural dissimi-larity matrices with fMRI data. Because the voxels

across the runs of a given subject correspond to eachother, we computed the subject-specific compromiseas the weighted linear sum of run-wise data tables(Abdi et al., 2012). We then computed a group-wise STATIS solution on eleven such compromisetables. The factor scores and voxel-loadings for thissolution are reported in the main text. Because thegroup compromise matrix was made up of subject-specific, horizontally stacked datasets, the PCA ofthis matrix yields voxel loadings for each voxel ineach subject. The factor scores and voxel-loadingsfor this solution are reported in the main text. Inorder to evaluate the shared functional topographyassociated with the voxel loadings across subjects,the volumetric results were mapped to the standardsurface mesh grid as described above.

Because STATIS provides transformation matri-ces from the original high dimensional space into thelower dimensional PC space, it is possible to projectnew supplemental observations into the PC space ifthose observations are on the same set of originalvariables. We exploited this feature by computingSTATIS on subsets of the stimuli (i.e., just primatesand tools, or just the non-primate animals), and thenmapped the left-out stimuli into the PC space assupplemental observations (Fig. 4). We also exploitthis feature for using STATIS for classification withfactor scores (Fig. 5). For each data-fold in cross-validation, STATIS is computed using the trainingdata only and then the training and testing data aremapped into the PCA space based on the trainingdata to avoid “peeking”.

Data and Code

The implementation of STATIS that weused in this paper is available as a part ofPyMVPA (http://www.pymvpa.org). In ad-dition, we are supplying the data and thePython code in the form of an IPython note-book (http://ipython.org/notebook.html) that weused to run the STATIS and classification analyses.The purpose of this material is to allow readersto reproduce the non-standard analyses reportedin this paper, and so that readers can explorefirst-hand the effects of changing various analysisparameters. The data included are anonymous,preprocessed, and masked to include just the ROI.To make it easier for readers to run these analyses,the IPython notebook and data are provided withina modified NeuroDebian (http://neuro.debian.net/)

7

Page 8: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

virtual machine that has been updated to include allof the Python dependencies needed to run the anal-yses. This material is available for download and ispresently hosted at http://haxbylab.dartmouth.edu.

RESULTS

Behavioral ratings of similarity

We collected behavioral similarity ratings from 40anonymous raters using an online survey. We com-puted an average behavioral dissimilarity matrix(DSM) based on these ratings (Fig. 1b). Multidi-mensional scaling of this DSM (Fig. 1b right) revealsa sharp distinction between animate and inanimatecategories expressed by the first dimension. The sec-ond dimension captures variation that suggests ananimacy gradient that ranges from mammals to in-vertebrates and fish. Thus, the behavioral ratingsof the stimuli show both a clear distinction betweenanimates and inanimates and a continuous animacygradient within the animate domain.

Searchlight analyses and ROI definition

In order to assess the distribution of category sensi-tive brain regions and also to define a region of inter-est (ROI) for further analysis, we began our analysesusing whole-brain multivariate searchlights.

First, we searched for brain regions with neural re-sponse DSMs that resembled the behavioral ratingDSM. We used a representational similarity analy-sis (Kriegeskorte et al., 2008a) (RSA) surface-basedsearchlight (Kriegeskorte et al., 2006; Oosterhof etal., 2010, 2011; Connolly et al., 2012b) to map cor-relations between behavioral and neural DSMs. Cor-relations between the behavioral and neural DSMswere highest in bilateral lateral occipital (LO) andventral temporal (VT) cortices (Fig. 2a). The be-havioral ratings show two separate dimensions: 1) astrong animate-inanimate divide, and 2) an animacygradient on the second dimension. The similaritysearchlight showed reliable but somewhat weak cor-relations between neural similarity and behavioralsimilarity (Figure 3a, r ≤ .3). It is not possibleto know given these correlations which aspects ofbehavioral and neural similarity are correlated witheach other. Therefore, it is necessary to further eval-uate the neural representational structure to assessthe relative contributions of an animacy dichotomyversus an animacy continuum.

Figure 2. Searchlight results. a) Correspondencebetween neural similarity and behavioral judgmentswas assessed as the correlation between the behav-ioral DSM and the neural DSM at each searchlightlocation. Mean correlation values were thresholdedusing a group t statistic for r > 0, t(10) = 3,p < .02. b) Twelve-way classification accuracy wasassessed at each searchlight. The mean proportioncorrect was thresholded at t(10) = 4.0, p < .001,and this map was clustered to generate the groupROI mask. CS: collateral sulcus; FG: fusiform gyrus;OTS: occipito-temporal sulcus.

A second searchlight analysis used a 12-way sup-port vector machine (Cortes & Vapnik, 1995; Chang& Lin, 2011) (SVM) pattern classifier to map regionswith distinguishing information about our 12 cate-gories. Again bilateral LO and VT, i.e., nearly allof object-vision cortex, showed significantly abovechance classification accuracy (Fig. 2b). Next, wedefined an ROI for further analysis using the surfacenodes with highest average MVP classification ac-curacy across subjects. Note that in both the RSAand SVM searchlight analyses surface nodes in earlyvision along the calcarine sulcus are visible on the re-sults maps (Figure 2a and 2b). These surface nodeswere not contiguous with the larger sets of surfacenodes identified throughout LO and VT, and there-fore these did not cluster with the larger set, leav-ing only the voxels that can be seen as colored vox-els in Fig. 3. The above chance 12-way classifi-cation in early visual cortex (Figure 2b) indicatesthat early vision carried some signal for distinguish-ing between our categories, however, the similaritystructure in early vision was only weakly correlatedwith behavioral structure compared to later visual

8

Page 9: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

regions. Furthermore, our searchlight analysis us-ing a dichotomous animate versus inanimate and ananimacy continuum model (Figure 7) in a multipleregression to predict neural similarity, shows thatneither model strongly predicted structure in earlyvision. Because the focus of the present article wasto explore structure in later vision, we do not reporta detailed analysis of structure in early vision.

Structural analysis using STATIS

We used STATIS (Abdi et al., 2012) to analyzethe structure of the category-related neural patternswithin this ROI, which encompassed all of the lateraloccipital complex (LOC) and extended to the pos-terior intraparietal sulcus. Like PCA, STATIS com-putes principal components (PCs), each of which hasfactor scores (Fig. 3a) for the categories and load-ings (Fig. 3b) for the voxels in each subject. Thereliability of factor scores was assessed using a boot-strap method to derive confidence intervals aroundthe factor score centroidsthe ellipses in Fig. 1c. Thefirst PC accounts for 55% of the variance, the secondPC accounts for 10%, and the third PC accounts for≈ 7% (Fig. 3c). Factor scores on the first PC (Fig.1c, 3a) show a continuous gradient of animacy: pri-mates and quadruped mammals cluster on one endand tools on the other, with fish and invertebratescloser to tools and birds closer to mammals. Wedid not observe the sharp contrast between animateand inanimate categories that was evident in the be-havioral ratings. Rather, the activation patterns fortools are similar to those for stingrays, ladybugs, andlobsters, placing the tools near the less animate endof the animacy gradient, while the activation pat-terns for humans are similar to those for chimpanzeesand cats. These results are robust with respect towhether, instead of voxel scaling (see Methods), weused voxel selection with 300 and 2000 voxels, andwhen we used no voxel selection or scaling (resultsnot shown).

To examine the relationship between the domi-nant first dimension, which was based on responsesto both animals and tools, and the animacy dimen-sion as defined solely within animal classes, we usedSTATIS to define PCs based on different subsets inour data: 1) PCs on only primates and tools (Fig.4a); and 2) PCs based on only animal categorieswithout primates and tools (Fig. 4b). For each solu-tion, we projected the remaining categorical patternsonto the corresponding PC space, and found graded

levels of animacy with little change compared to thefirst PC defined based on all categories. These re-sults indicate that neural activity in our ROI treatsboth animate and inanimate objects along a singlecontinuous dimension that reects the level of ani-macy, and the animate-inanimate distinction is oneaspect of this dimension.

The functional topography of the animacycontinuum

To characterize the functional topography for theanimacy dimension, we examined the voxel loadingsfor each of our PCs (Fig. 3b). For PC1, posi-tive loadings indicate more activation for highly an-imate categories, i.e., those categories with positivefactor scores on PC 1, and negative loadings indi-cate more activation for less animate or inanimatecategoriesthose that have negative factor scores onPC1. The functional topography in bilateral VTresembles previously reported coarse-scale animateversus inanimate activation topographies in whichthe lateral fusiform gyrus responds more to animateobjects and the medial fusiform responds more toinanimate objects (Chao et al., 1999; Mahon et al.,2009; Connolly et al., 2012b). Bilateral superior LOcortex showed more activation for highly animateobjects, whereas bilateral inferior and posterior LOcortices showed more activation to less animate orinanimate objects. This topography is very simi-lar to the map of the first PC based on primatesand tools (Fig. 4a), as well as the map defined us-ing just 8 animal categories without primates andtools (Fig. 4b). While the voxel loading maps con-vey the relative contributions of activation gradi-ents to the overall representational space, the sta-tistical maps show that even lower order PCs havesome topological consistency across subjects (datanot shown beyond PC2). Lower order PCs, how-ever, should be interpreted with caution. PCA de-termines a set of orthogonal principal components,but it does not follow that the underlying neuralgenerators—the true latent variables that shape therepresentational space—are orthogonal or even in-dependent of each other. We do not know the truenumber of underlying neural generators—and eventhe first PC may capture variation that is generatedby two or more non-independent signals. The resultsof the PCA indicate that the representational spaceis high-dimensional with numerous neural generatorsthat are multiplexed in the cortex of the ventral vi-

9

Page 10: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

Figure 3. a) The complete set of factor scores for STATIS analysis in ROI. Error bars show the standarderror of the mean. Each bar plot is shown on the same scale for the y-axis, which ranges from −0.06 to 0.06.Note that while some lower order PCs (e.g., 7) show separation of between low-animate and inanimateobjects, none of these show a systematic difference. b) The functional topographies associated with thefirst two PCs, rendered as the mean voxel loadings across subjects. The voxel loadings were calculatedfor each voxel in the volumes for each subject, then mapped to the standard surface mesh grid. The colorscales are all the same across PCs, which ranges from −0.0018 to 0.0018. Like for the factor scores theabsolute magnitudes of individual loadings are not meaningful because the data were normalized prior toanalysis. Only the relative magnitudes and the sign carry meaning. c) Scree plot showing the percentagesof variance accounted for by each PC for the STATIS analysis in the ROI. PCs one and two togetheraccount for 65% of the variance. CS: collateral sulcus; FG: fusiform gyrus; OTS: occipito temporal sulcus;STS: superior temporal sulcus; Occ. pole: occipital pole.

10

Page 11: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

= 35%

= 83%

b

T 5-5

2

1

a 2

1

= 0.28

= 0.06

= 0.03

= 18%

= 0.03

= 8%

Figure 4. a) Multidimensional scaling showing the factor scores for first two PCs from STATIS calculatedusing just primates and tools. The remaining stimulus patterns were then projected into the PC space assupplemental observations (dashed ellipses). b) MDS from STATIS based on just the non-primate animalswith primates and tools projected into the space as supplemental observations.

sion pathway, each with a topography that is, atleast partially, shared across subjects.

The animacy continuum captures most ofthe between biological class differentiation

In order to further characterize the contributions ofthe PCs to the representation of our stimulus cat-egories, we performed a series of analyses to com-pare classification performance as a function of theinclusion and exclusion of each PC. This was doneby running classification analysis using just the fac-tor scores of the PCs. Our stimuli included one su-perordinate class of inanimate stimuli (tools) and 5different superordinate biological classes with differ-ent levels of animacy. If the first PC captures ani-macy information, then this PC by itself should be asufficient basis for classifying between-class pairingsamong animal classes and tools. However, within-class distinctions, such as warblers versus pelicans,may not be captured well by the animacy dimen-sion, and classification of these pairings may de-pend on other dimensions. To test this hypothe-sis, we performed within subject SVM classificationfor all 66 unique pairs of categories and we present

the results for within- and between-superordinatecategory pairs separately (Fig. 5). Classifica-tion using all PCs resulted in mean classification of0.87 ± 0.02 (mean ±s.e.m.) for between-class pairsand 0.77 ± 0.03 for within-class pairs. Classificationbased on just the first PC resulted in mean clas-sification accuracy of 0.79 ± 0.01 for between-classpairs and 0.61± 0.02 for within-class pairs. Classifi-cation of between-class pairs based on all other PCs(0.78 ± 0.02) was equivalent to classification basedon the first PC alone (t(10) = 0.99, ns), whereasclassification of within-class pairs based on all otherPCs (0.78 ± 0.03) was significantly better than clas-sification based on the first PC (t(10) = 5.92, p <0.001) and equivalent to classification using all PCs(t(10) = 1.27, ns). These results show that the firstPC contains mostly information about between-classdistinctions—whereas information about finer dis-tinctions within classes is more distributed acrossPCs. For example, the distinction between pelicansand warblers was not evident on the first PC (Fig.1c, 3a) but was seen in the second, third, fifth, andsixth PCs (Fig. 3a). Similarly, the distinction be-tween humans and chimpanzees was evident in only

11

Page 12: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

.5

.6

.7

.8

.9P

rop

ortio

n co

rrec

t

between within

PC1 only PC2 - PC12

between within

Figure 5. The contributions of PC1 versus all otherPCs for within- and between-class pairwise distinc-tions measured using support vector machine pat-tern classifiers. Error bars show 95% confidence in-tervals.

the fourth, sixth, tenth, and the eleventh PCs (Fig.3a). In addition, some between-class distinctions arecarried on lower order PCs: The fourth PC pro-vides between-class distinction for birds versus in-vertebrates, while not carrying much distinguishinginformation for within-class pairs for either birds orinvertebrates.

Information content on the high and lowends of the animacy continuum topography

To examine whether the discriminating informationabout “high animacy” stimuli was localized in cortexwith positive values on the animacy continuum di-mension and information about “low animacy” stim-uli is localized to cortex with negative values on theanimacy continuum, we sub-sampled our ROI intothe voxels that loaded the most positively and themost negatively on the first PC, keeping 300 voxelsfor each partition. We then tested pairwise classi-fication for high animacy categories (humans ver-sus chimpanzees) and inanimate categories (ham-mers versus keys) in each division. To evaluate theoutcome of the classification accuracies, we used atwo-way analysis of variance with voxel loading asone independent variable with two levels (positiveversus negative loadings), animacy of the stimuli asthe second independent variable also with two lev-

allposneg

0.

.2

.4

.6

.8

1.0

key

hammer

lobster

ladybug

ray

clownfish

warbler

pelican

giraffe

cat

chim

panzee

human

Figure 6. Proportion of times each stimulus wasclassified correctly in 12-way SVM classification us-ing all voxels, only voxels that loaded positively onthe first PC, and only voxels that loaded negativelyon the first PC. The error bars show 95% confidenceintervals based on the standard error of the meanacross subjects.

els (primates versus tools), and with subjects as therandom variable. Main effects were not observedfor voxel loading (mean proportions correct: positive= 0.70±0.15; negative = 0.66±0.16; F (1/10) = 0.99,ns) nor for animacy (primates = 0.69 ± 0.16; tools= 0.67 ± 0.16; F (1/10) = 0.24, ns), and there wasno interaction (pos-primates = 0.72±0.15; pos-tools= 0.69 ± 0.17; neg-primates = 0.66 ± 0.17; neg-tools= 0.66 ± 0.17; F (1/10) = 0.14, ns). Classificationaccuracies for 12-way SVM classification were equiv-alent across all categories with one exception: accu-racies for cats were higher in positive voxels than innegative voxels, but these results did not representany general trend (Fig. 6).

These results indicate that information that dis-criminates among high animacy categories is notfound predominantly in the cortex that respondsmost strongly to those stimuli, and, similarly, in-formation that discriminates low animacy and inan-imate stimuli is not found predominantly in thecortex that responds most strongly to those stim-uli. Rather the information about high-animacy,low-animacy, and inanimate categories is distributedacross the ventral vision pathway that contains the

12

Page 13: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

animacy continuum.

No evidence for a sharp animate-inanimatedistinction in cortex

The dichotomous distinction between animate andinanimate objects is psychologically salient andprominent in the similarity space defined by behav-ioral judgments (Fig. 1b). By contrast, the simi-larity structure defined by neural signal in the ven-tral vision pathway is dominated by the animacycontinuum with overlap between the lowest animacyspecies and inanimate tools. In order to test whetherthe dichotomous distinction between animate andinanimate objects may be represented in some otherbrain region, we built two model-based dissimilaritymatrices: 1) a dichotomous model in which minimaldistance is posited within living and non-living cat-egories and maximum distance between these twocategories; and 2) a continuum model that positsminimal dissimilarity between objects at the sameanimacy level (primates, mammals, birds, etc.) andincreasing dissimilarity further along the animacycontinuum (Fig. 7a). We then used these modelsas predictors of neural DSMs in a multivariate lin-ear least-squares regression searchlight analysis.

The results of this analysis revealed a good fit be-tween the model and neural activity throughout bi-lateral LO cortex. The model accounted for 51%of the variance in searchlight DSM centered on thepeak voxel in the middle of the right fusiform. Com-parison of the β coefficients for the two model DSMsshow that the fit of the model throughout LO cortexis attributable to the high correlation between thecontinuum model and neural representational struc-ture in this region. β values for the dichotomousmodel were only higher in early visual cortex. Noregion outside of early visual cortex showed any ev-idence for representing the dichotomous distinctionbetween animate and inanimate objects. Moreover,the dichotomous model was not a good fit for theneural structure in early vision, as it accounted foronly a small percent of the variance. Furthermore,structural analysis using STATIS in early visual cor-tex revealed overlapping representations for livingand nonliving objects (results not shown). Thus,the neural basis for the psychologically predominant,dichotomous animate-inanimate distinction (first di-mension in Fig. 1b) was not revealed in our experi-mental data.

Living versus NonlivingAnimacy Continuum

Right Left

minus

difference in 1 -1

's

Figure 7. Multiple regression searchlight usingmodel DSMs shows no evidence for a sharp animate–inanimate distinction. Model DSMs (top) for theanimacy continuum model and the dichotomous liv-ing versus non-living model (stimuli arranged as inFig. 1) were used as regressors in multiple regressionto predict neural similarity at each searchlight. Themap shows the difference in beta-weights for the con-tinuum model minus the dichotomous model. Themap is thresholded using the R2 fit for the regressionmodel with R2 > 0.2.

DISCUSSION

The results of this study suggest that the viewof ventral vision pathway as encoding a dichoto-mous living–non-living distinction is incorrect; fur-thermore, a graded dimension of animacy describesa major part of the representational space in thehuman ventral vision pathway. Representations ofinanimate objects share the same location on thisdimension as animals with “low animacy”, such asinvertebrates and fish. The animacy continuum iscarried by the same topography that has been as-sociated with animate-inanimate domain specificity.We propose that the representation of the animate-inanimate distinction in the ventral vision pathwayis better understood as one facet of an animacy con-tinuum. Our previous results showed that a con-tinuous dimension from “least animate” to “mostanimate” captured most of the variance in neuralrepresentation among biological classes (Connolly etal., 2012b). The current study shows that bona fideinanimate categories are represented along this con-tinuum in a manner similar to animals with low an-

13

Page 14: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

imacy, a pattern that blurs the distinction betweenthe animate and inanimate domains in neural repre-sentational space.

The similarity structure for neural responses inthe ventral vision pathway differed significantly fromthe similarity structure based on behavioral ratings.In particular, the dominant dimension in the neuralrepresentational space is for the animacy continuum,and the representation of tools is embedded in thatcontinuum on the low animacy end. By contrast, theprimary distinction in the behavioral ratings was be-tween all living animals and the inanimate tools (firstdimension, Fig. 1a), with equivalent values for an-imals, except humans, suggesting invariance acrossstimuli depicting different species. The second di-mension reected a continuum of the living animalswith an ordering that closely matched the animacycontinuum in the neural representational space. Thedistinction between humans and other animals, espe-cially chimpanzees, was not evident on the animacycontinuum but was only seen on lower order PCs.These discrepancies between behavioral and neuralrepresentational spaces suggest that the categoricaldistinction between all living, animate entities andnonliving entities in the ventral vision pathway is ac-tually just one facet of a more general organizationprinciple.

The neural representation of an animacy con-tinuum may be shared with other primates. Thebetween-category correlations for population re-sponses in monkey IT cortex (Kiani et al., 2007)show that insects and butteries are more highly cor-related with fish and reptiles than with the categoriesof mammalian species, with birds in an intermediateposition. The data in Kiani et al. (Kiani et al.,2007), however, appear to show a larger distinctionbetween their inanimate and low animacy categories(their Fig. 10) than we find in human VT cortex.

The contrast between responses to primates andtools replicates the topography for the distinctionbetween animate and inanimate stimuli seen in manyprevious studies. We show, however, that the ani-macy continuum, which involves no nonliving stim-uli, has essentially the same topography (Fig. 4b)and that the responses to “low animacy” animalsare very similar to the responses to tools (Fig. 1c).This topography, therefore, does not reect the bi-nary animate–inanimate distinction with invarianceacross living animals but, rather, reects a continuumin which the inanimate stimuli are embedded among

low animacy animals.

Information about high animacy stimuli is not lim-ited to areas that have positive values on this di-mension, and likewise, information about low ani-macy stimuli is not limited to areas with negativevalues on the animacy continuum. Rather, in gen-eral, the distinctions among high animacy stimuli areas strong in the cortex with weak responses to thesestimuli as in the cortex with strong responses, andlikewise for distinctions among low animacy animalsand inanimates—consistent with previous reports byus and others (Haxby et al., 2001, 2011; Op de Beecket al., 2010; Brants et al., 2011; Huth et al., 2012).Thus, both high and low animacy categories are rep-resented in patterns of brain activity in both lateraland medial ventral temporal cortex.

The topographies of PCs (Figs. 3b) are pat-tern basis functions that capture the differential pat-terns of response that carry stimulus distinctions.For example, although the animacy continuum doesnot capture the distinction between responses towarblers and pelicans, adding in the second andthird PCs does capture the differential topographiesfor these responses. In general, the first PC—i.e.,the animacy continuum—captures coarser distinc-tions among animal classes, whereas the informa-tion about finer within-class distinctions is more dis-tributed across PCs. The patterns that capture adistinction between two categories are not restrictedto the cortex that has positive values for those cat-egories on the dominant animacy continuum to-pography, indicating that information about animalspecies and objects is broadly distributed.

The blurring between tools and low animacy an-imals is further corroborated by studies of clearlyinanimate objects that behave in ways that resem-ble the behavior of animate entities. Animations ofsimple geometric shapes that move in ways to sug-gest social interactions consistently evoke strong ac-tivity in the lateral parts of ventral temporal cor-tex (Castelli et al., 2002; Beauchamp et al., 2003;Schultz et al., 2003; Gobbini et al., 2007). Robotsthat have features of human faces but are clearly me-chanical, and thus never mistaken for living faces,evoke strong activity in lateral fusiform cortex whenthey make recognizable facial expressions (Gobbiniet al., 2011). Industrial robots that have minimalstructural similarity to animals evoke stronger ac-tivity in the right fusiform cortex when they per-form goal-directed actions, as compared to non-goal-

14

Page 15: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

directed actions (Shultz & McCarthy, 2012). Gob-bini et al. (2011) proposed that the occipital facearea, fusiform face area, and face-responsive poste-rior STS “participate in the representation of agen-tic forms and actions but do not distinguish be-tween animate and inanimate agents” (p. 1915).Together these studies show that the responses toinanimate agents in the ventral vision pathway areon the higher animacy side of the animacy contin-uum. The common denominator seems to be theperformance or potential to perform self-initiated,complex, goal-directed actions. The word, animameans soul in Latin. Clearly, robots do not havesouls, but they can perform actions that are the re-sult of complex internal states that must be inferred.Thus, the animacy continuum appears to be relatedto the perception of things, be they animate or me-chanical, that involves inference of internal states ofvarying levels of complexity. Perception of these in-ternal states affords prediction of behavior and cancondition how we interact with these things. Rep-resentation of level of animacy, as broadly definedhere, thus is a necessary precondition for recruit-ment of the appropriate cognitive resources for un-derstanding and interacting effectively with animalsand objects.

Where does the animacy continuum comefrom?

Previous theories of the animate-inanimate distinc-tion were premised on observations of a dichotomousdistinction. The strong form of the DSH proposedthat evolution provided for discrete modules for pro-cessing conspecifics, animals, and tools. Our ob-servations that the least animate animals evoke ac-tivity similar to tools and that the most animateanimals evoke activity similar to conspecifics, chal-lenges the notion of discrete modules. However, theidea that evolution played a role in determining neu-ral organization of categories is consistent with theanimacy continuum hypothesis: It can be arguedthat the abstract property of animacy has ecologi-cal and evolutionary significance. Alternatively, thestrong form of the SFH proposed that visual fea-tures were more important for identifying animalsand functional-motor features were more importantfor tools. Our observation that the least animateanimals share representational space with tools chal-lenges this strong version of SFH. There are no obvi-ous functional-motor properties relevant to spiders,

ladybugs, and stingrays that can explain their sim-ilarity to tools. It is also not obvious why visualproperties are less important for recognizing a lady-bug compared to a chimpanzee.

We interpret the animacy continuum as corre-sponding to an abstract psychological dimension ofperceived animacy that is measurable in human be-havioral experiments, such as that reported in Fig.1b where the animacy continuum is reflected on thesecond dimension of the MDS space. Evidence of an-imacy as a graded psychological dimension is foundin the writings of Aristotle, in linguistics, and in folktaxonomies. The notion of graded animacy has eco-logical validity: It is a fact about the world thatanimals vary with respect to intelligence, agency,and the degree to which animals share characteris-tics with the animate prototype—humans. Leadingtheories of brain evolution propose that the largesize and computational power of the human brainarose from the pressures of living in large complexsocial groups (Dunbar, 1998). The animacy contin-uum hypothesis is consistent with this view in thatunderstanding the actions and intentions of othersentient beings is a central component. By exten-sion, the similarity of brain activity for perceivinganimals and perceiving humans will be proportionalto the degree to which the animals display agenticproperties.

In concluding we address a likely objection to ourinterpretation of the results: that animacy makes apoor candidate for an organizing principle of neuralrepresentation because it is not a simple, primary, orbasic property. In the tradition of David Hume andJohn Locke, the modern empiricist notion of nothingin the brain not first in the senses pervades the cog-nitive sciences (Machery, 2006). Neo-empiricists willlikely judge animacy too abstract to play a centralrole in representation. We disagree. As a perceptualact, perceiving the property of animacy is necessar-ily tied to sensory input, but we find that accountsfor animacy in terms of sensory properties are convo-luted and not parsimonious. No simple sensory fea-ture, such as color, texture, or motion energy, has aclear relationship to the animacy continuum. Morecomplex features—the presence of faces and bodies;gaze, expression and other intentional movements;complex vocalizations—are related to the continuumbut no single feature is invariably present or neces-sary by itself to perceive level of animacy or agency.The level of animacy can be perceived based on ac-

15

Page 16: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

tion without form (Castelli et al., 2002; Gobbini etal. 2007) and on form without action (still images),and both are associated with the lateral-to-medialgradient for animacy in the ventral pathway. Thecongenitally blind can perceive agents and show anormal lateral-to-medial gradient for animacy in theventral pathway despite the lack of visual sensory in-put or experience (Mahon et al. 2009). The commondenominator, therefore, is more the abstract prop-erty of animacy or agency than a simple correlationwith sensory features.

Acknowledgments

Funding was provided by National Institutes of Men-tal Health grants: F32MH085433-01A1 (Connolly),and 5R01MH075706 (Haxby); and by the NationalScience Foundation grant: NSF1129764 (Haxby).We also thank those who have helped in providingexpert advice and other assistance, especially M. IdaGobbini, Michael Hanke, Scott Pauls, and the mem-bers of the Haxby Lab.

REFERENCES

Abdi, H., Dunlop, J. P., & Williams, L. J. (2009).How to compute reliability estimates and displayconfidence and tolerance intervals for patternclassifiers using the bootstrap and 3-way multi-dimensional scaling (distatis). NeuroImage, 45,8995.

Abdi, H., Williams, L. J., Valentin, D., & Bennani-Dosse, M. (2012). Statis and distatis: opti-mum multitable principal component analysisand three way metric multidimensional scaling.Wiley Interdisciplinary Reviews: ComputationalStatistics, 4, 124167.

Aguirre, G. K. (2007). Continuous carry-over de-signs for fmri. Neuroimage, 35, 14801494.

Allport, D. (1985). Distributed memory, modularsubsystems and dysphasia. Current perspectivesin dysphasia, pp. 3260.

Aristotle, A. (1986). De Anima: On the Soul. (Pen-guin Classics).

Atran, S. et al. (1998). Folk biology and the an-thropology of science: Cognitive universals andcultural particulars. Behavioral and brain sci-ences, 21, 547569.

Beauchamp, M. S., Lee, K. E., Haxby, J. V., &Martin, A. (2003). Fmri responses to video and

point-light displays of moving humans and ma-nipulable objects. Journal of Cognitive Neuro-science, 15, 9911001.

Brants, M., Baeck, A., Wagemans, J., & Op deBeeck, H. P. (2011). Multiple scales of organiza-tion for object selectivity in ventral visual cortex.Neuroimage, 56, 13721381.

Caramazza, A. & Mahon, B. Z. (2003). The orga-nization of conceptual knowledge: the evidencefrom category-specific semantic deficits. Trendsin cognitive sciences, 7, 354361.

Caramazza, A. & Shelton, J. R. (1998). Domain-specific knowledge systems in the brain: Theanimate-inanimate distinction. Journal of cog-nitive neuroscience, 10, 134.

Castelli, F., Frith, C., Happ, F., & Frith, U. (2002).Autism, asperger syndrome and brain e mech-anisms for the attribution of mental states toanimated shapes. Brain, 125, 18391849.

Chang, C.-C. & Lin, C.-J. (2011). Libsvm: a libraryfor support vector machines. ACM Transactionson Intelligent Systems and Technology (TIST),2, 27.

Chao, L. L., Haxby, J. V., & Martin, A. (1999).Attribute-based neural substrates in temporalcortex for perceiving and knowing about objects.Nature neuroscience, 2, 913919.

Connolly, A. C., Gobbini, M. I., & Haxby, J. V.(2012a). Three virtues of similarity-based mul-tivariate pattern analysis: An example from thehuman object vision pathway. Visual populationcodes: toward a common multivariate frameworkfor cell recording and functional imaging.

Connolly, A. C., Guntupalli, J. S., Gors, J., Hanke,M., Halchenko, Y. O., Wu, Y.-C., Abdi, H., &Haxby, J. V. (2012b). The representation of bio-logical classes in the human brain. The Journalof Neuroscience, 32, 26082618.

Cortes, C. & Vapnik, V. (1995). Support-vector net-works. Machine learning, 20, 273297.

Cox, R. W. (1996). Afni: software for analysisand visualization of functional magnetic reso-nance neuroimages. Computers and Biomedicalresearch, 29, 162173.

Dunbar, R. I. (1998). The social brain hypothe-sis.Brain,9(10), 178-190.

Eger, E., Ashburner, J., Haynes, J.-D., Dolan, R.J., & Rees, G. (2008). fmri activity patterns inhuman loc carry information about object exem-plars within category. Journal of cognitive neu-

16

Page 17: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

roscience, 20, 356370.

Finney, D. & Outhwaite, A. D. (1956). Serially bal-anced sequences in bioassay. Proceedings of theRoyal Society of London. Series B-Biological Sci-ences, 145, 493507.

Fischl, B. (2012). Freesurfer. Neuroimage, 62,774781. Gobbini, M. I., Gentili, C., Ricciardi,E., Bellucci, C., Salvini, P., Laschi, C., Guazzelli,M., & Pietrini, P. (2011). Distinct neural sys-tems involved in agency and animacy detection.Journal of cognitive neuroscience, 23, 19111920.

Gobbini, M. I., Koralek, A. C., Bryan, R. E., Mont-gomery, K. J., & Haxby, J. V. (2007). Two takeson the social brain: A comparison of theory ofmind tasks. Journal of Cognitive Neuroscience,19, 18031814.

Gobbini, M.I., Gentili, C., Ricciardi, E., Bellucci, C.,Salvini, P., Laschi, C., Guazzelli, M., Pietrini,P. (2011). Distinct neural systems involved inagency and animacy detection. Journal of Cog-nitive Neuroscience, 23, 1911-1920.

Hanke, M., Halchenko, Y. O., Sederberg, P. B., Han-son, S. J., Haxby, J. V., & Pollmann, S. (2009).Pymvpa: A python toolbox for multivariate pat-tern analysis of fmri data. Neuroinformatics, 7,3753.

Hanson, S. J., Matsuka, T., & Haxby, J. V. (2004).Combinatorial codes in ventral temporal lobe forobject recognition: Haxby (2001) revisited: isthere a face area? Neuroimage, 23, 156166.

Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai,A., Schouten, J. L., & Pietrini, P. (2001). Dis-tributed and overlapping representations of facesand objects in ventral temporal cortex. Science,293, 24252430.

Haxby, J. V., Guntupalli, J. S., Connolly, A. C.,Halchenko, Y. O., Conroy, B. R., Gobbini, M.I., Hanke, M., & Ramadge, P. J. (2011). A com-mon, high-dimensional model of the representa-tional space in human ventral temporal cortex.Neuron, 72, 404416.

Huth, A. G., Nishimoto, S., Vu, A. T., Gallant,J. L. (2012) A Continuous Semantic Space De-scribes the Representation of Thousands of Ob-ject and Action Categories across the HumanBrain. Neuron, 76, 1210-1224.

Kherif, F., Poline, J.-B., Mriaux, S., Benali, H.,Flandin, G., & Brett, M. (2003). Group e anal-ysis in functional neuroimaging: selecting sub-jects using similarity measures. Neuroimage, 20,

21972208.

Kiani, R., Esteky, H., Mirpour, K., & Tanaka, K.(2007). Object category structure in responsepatterns of neuronal population in monkey infe-rior temporal cortex. Journal of neurophysiol-ogy, 97, 42964309.

Konkle, T. & Oliva, A. (2012). A real-world sizeorganization of object responses in occipitotem-poral cortex. Neuron, 74, 11141124.

Kriegeskorte, N., Goebel, R., & Bandettini, P.(2006). Information-based functional brain map-ping. Proceedings of the National Academy ofSciences of the United States of America, 103,38633868.

Kriegeskorte, N., Mur, M., & Bandettini, P. (2008a).Representational similarity analysis connectingthe branches of systems neuroscience. Frontiersin Systems Neuroscience.

Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R.,Bodurka, J., Esteky, H., Tanaka, K., & Bandet-tini, P. A. (2008b). Matching categorical ob-ject representations in inferior temporal cortexof man and monkey. Neuron, 60, 11261141.

Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S.,& Baker, C. I. (2009). Circular analysis in sys-tems neuroscience: the dangers of double dip-ping. Nature neuroscience, 12, 535540.

Levy, I., Hasson, U., Avidan, G., Hendler, T., &Malach, R. (2001). Centerperiphery organiza-tion of human object areas. Nature neuroscience,4, 533539.

Machery, Edouard. ”Two Dogmas of Neo-Empiricism.” Philosophy Compass 1, no. 4(2006): 398-412.

Mahon, B. Z., Anzellotti, S., Schwarzbach, J.,Zampini, M., & Caramazza, A. (2009). Categoryspecific organization in the human brain does notrequire visual experience. Neuron, 63, 397 405.

Mahon, B. Z. & Caramazza, A. (2009). Concepts &categories: A cognitive neuropsychological per-spective. Annual Review of Psychology, 60,2751.

Martin, A. (2007). The representation of object con-cepts in the brain. Annu. Rev. Psychol., 58,2545.

Oosterhof, N. N., Wiestler, T., Downing, P. E.,& Diedrichsen, J. A. (2011). A comparison ofvolume-based and surface-based multi-voxel pat-tern analysis. Neuroimage, 56, 593600.

17

Page 18: The animacy continuum in the human ventral vision pathway

Sha et al. • To appear in Journal of Cognitive Neuroscience (Not for distribution)

Oosterhof, N. N., Wiggett, A. J., Diedrichsen, J.,Tipper, S. P., & Downing, P. E. (2010). Surface-based information mapping reveals crossmodalvisionaction representations in human parietaland occipitotemporal cortex. Journal of Neu-rophysiology, 104, 10771089.

Op de Beeck, H. P., Brants, M., Baeck, A., & Wage-mans, J. (2010). Distributed subordinate speci-ficity for bodies, faces, and buildings in humanventral visual cortex. Neuroimage, 49, 34143425.

OToole, A. J., Jiang, F., Abdi, H., & Haxby, J.V. (2005). Partially distributed representationsof objects and faces in ventral temporal cortex.Journal of Cognitive Neuroscience, 17, 580590.

OToole, A. J., Jiang, F., Abdi, H., Pnard, N., Dun-lop, J. P., & Parent, M. A. (2007). Theoretical, estatistical, and practical perspectives on pattern-based classification approaches to the analysis offunctional neuroimaging data. Journal of cogni-tive neuroscience, 19, 17351752.

Schultz, R. T., Grelotti, D. J., Klin, A., Kleinman,J., Van der Gaag, C., Marois, R., & Skudlarski,P. (2003). The role of the fusiform face area insocial cognition: implications for the pathobiol-ogy of autism. Philosophical Transactions of theRoyal Society of London. Series B: BiologicalSciences, 358, 415427.

Serre, T., Oliva, A., & Poggio, T. (2007). A feed-forward architecture accounts for rapid catego-rization. Proceedings of the National Academyof Sciences, 104, 64246429.

Shinkareva, S. V., Ombao, H. C., Sutton, B. P., Mo-hanty, A., & Miller, G. A. (2006). Classifica-tion of functional brain images with a spatio-temporal dissimilarity map. NeuroImage, 33,6371.

Shultz, S. & McCarthy, G. (2012). Goal-directedactions activate the face-sensitive posterior su-perior temporal sulcus and fusiform gyrus in theabsence of human-like perceptual cues. CerebralCortex, 22, 10981106.

Warrington, E. K. & Shallice, T. (1984). Cate-gory specific semantic impairments. Brain, 107,829853.

Young, R. W. & Morgan, W. (1987). The Navajolanguage: A grammar and colloquial dictionary.(University of New Mexico Press Albuquerque).

18