17
Spatial statistics and attentional dynamics in scene viewing Ralf Engbert # $ University of Potsdam, Germany Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany Hans A. Trukenbrod # $ University of Potsdam, Germany Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany Simon Barthelm ´ e # $ Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany University of Geneva, Geneva, Switzerland Felix A. Wichmann # $ Eberhard Karls University of T ¨ ubingen, Germany Bernstein Center for Computational Neuroscience T ¨ ubingen, Germany Max Planck Institute for Intelligent Systems, T ¨ ubingen, Germany In humans and in foveated animals visual acuity is highly concentrated at the center of gaze, so that choosing where to look next is an important example of online, rapid decision-making. Computational neuroscientists have developed biologically-inspired models of visual attention, termed saliency maps, which successfully predict where people fixate on average. Using point process theory for spatial statistics, we show that scanpaths contain, however, important statistical structure, such as spatial clustering on top of distributions of gaze positions. Here, we develop a dynamical model of saccadic selection that accurately predicts the distribution of gaze positions as well as spatial clustering along individual scanpaths. Our model relies on activation dynamics via spatially-limited (foveated) access to saliency information, and, second, a leaky memory process controlling the re-inspection of target regions. This theoretical framework models a form of context-dependent decision-making, linking neural dynamics of attention to behavioral gaze data. Introduction Research on visual attention models over the past 25 years has resulted in a number of computational models (Borji & Itti, 2013)—using diverse computational mechanisms—often capable of predicting fixation lo- cations for a given input image with reasonable accuracy (Itti, Koch, & Niebur, 1998; Kienzle, Franz, Sch ¨ olkopf, & Wichmann, 2009; Torralba, Oliva, Castelhano, & Henderson, 2006; Tsotsos et al., 1995). The models compute so-called saliency maps, highlighting those parts of an input image that stand out relative to the surrounding areas (Itti & Koch, 2001). However, the human visual system is foveated, i.e., it is only able to acquire high-resolution information from a very limited region surrounding the current gaze position (the fovea). Outside the foveal region, visual acuity falls off rapidly, while the effects of visual crowding increase, so that visual processing in the periphery has very limited resolution (Jones & Higgins, 1947; Levi, 2008; Rose- nholtz, Huang, & Ehinger, 2012). As a consequence, to explore an entire visual scene we must shift our gaze continually to new regions of interest by producing rapid eye movements (saccades) about three to four times per second (Findlay & Gilchrist, 2012). Thus, given the progress on mathematical models of visual attention, there is an increasing need for computational models that bridge the gap between static saliency maps—which a human observer’s visual system can only know after exploring the entire image with its fovea—and the dynamic principles of saccadic selection underlying the generation of scanpaths by human Citation: Engbert, R., Trukenbrod, H. A., Barthelm ´ e, S., & Wichmann, F. A. (2015). Spatial statistics and attentional dynamics in scene viewing. Journal of Vision, 15(1):14, 1–17. http://www.journalofvision.org/content/15/1/14, doi: 10.1167/15.1.14. Journal of Vision (2015) 15(1):14, 1–17 1 http://www.journalofvision.org/content/15/1/14 doi: 10.1167/15.1.14 ISSN 1534-7362 Ó 2015 ARVO Received May 9, 2014; published January 14, 2015

Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

Spatial statistics and attentional dynamics in scene viewing

Ralf Engbert $

University of Potsdam GermanyBernstein Center for Computational Neuroscience Berlin

Berlin Germany

Hans A Trukenbrod $

University of Potsdam GermanyBernstein Center for Computational Neuroscience Berlin

Berlin Germany

Simon Barthelme $

Bernstein Center for Computational Neuroscience BerlinBerlin Germany

University of GenevaGeneva Switzerland

Felix A Wichmann $

Eberhard Karls University of Tubingen GermanyBernstein Center for Computational Neuroscience

Tubingen GermanyMax Planck Institute for Intelligent Systems

Tubingen Germany

In humans and in foveated animals visual acuity is highlyconcentrated at the center of gaze so that choosing whereto look next is an important example of online rapiddecision-making Computational neuroscientists havedeveloped biologically-inspired models of visual attentiontermed saliency maps which successfully predict wherepeople fixate on average Using point process theory forspatial statistics we show that scanpaths containhowever important statistical structure such as spatialclustering on top of distributions of gaze positions Herewe develop a dynamical model of saccadic selection thataccurately predicts the distribution of gaze positions aswell as spatial clustering along individual scanpaths Ourmodel relies on activation dynamics via spatially-limited(foveated) access to saliency information and second aleaky memory process controlling the re-inspection oftarget regions This theoretical framework models a formof context-dependent decision-making linking neuraldynamics of attention to behavioral gaze data

Introduction

Research on visual attention models over the past 25years has resulted in a number of computational models(Borji amp Itti 2013)mdashusing diverse computational

mechanismsmdashoften capable of predicting fixation lo-cations for a given input image with reasonable accuracy(Itti Koch amp Niebur 1998 Kienzle Franz Scholkopfamp Wichmann 2009 Torralba Oliva Castelhano ampHenderson 2006 Tsotsos et al 1995) The modelscompute so-called saliency maps highlighting thoseparts of an input image that stand out relative to thesurrounding areas (Itti amp Koch 2001) However thehuman visual system is foveated ie it is only able toacquire high-resolution information from a very limitedregion surrounding the current gaze position (thefovea) Outside the foveal region visual acuity falls offrapidly while the effects of visual crowding increase sothat visual processing in the periphery has very limitedresolution (Jones amp Higgins 1947 Levi 2008 Rose-nholtz Huang amp Ehinger 2012)

As a consequence to explore an entire visual scene wemust shift our gaze continually to new regions of interestby producing rapid eye movements (saccades) aboutthree to four times per second (Findlay amp Gilchrist2012) Thus given the progress on mathematical modelsof visual attention there is an increasing need forcomputationalmodels that bridge the gap between staticsaliency mapsmdashwhich a human observerrsquos visual systemcan only know after exploring the entire image with itsfoveamdashand the dynamic principles of saccadic selectionunderlying the generation of scanpaths by human

Citation Engbert R Trukenbrod H A Barthelme S amp Wichmann F A (2015) Spatial statistics and attentional dynamics inscene viewing Journal of Vision 15(1)14 1ndash17 httpwwwjournalofvisionorgcontent15114 doi 10116715114

Journal of Vision (2015) 15(1)14 1ndash17 1httpwwwjournalofvisionorgcontent15114

doi 10 1167 15 1 14 ISSN 1534-7362 2015 ARVOReceived May 9 2014 published January 14 2015

observers Moreover part of the mismatch betweencomputer-generated saliency maps and actual gazepatterns might be explained by properties of thevisuomotor system (Findlay ampWalker 1999) Recentlya number of publications addressed specific aspects ofthis problem eg different roles for short and longsaccades (Tatler Baddeley amp Vincent 2006) or returnsaccades (Ludwig Farrell Ellis amp Gilchrist 2009Wilming Harst Schmidt amp Konig 2013) Moreoverbehavioral biases might produce an important contri-bution to eye-movement statistics (Tatler amp Vincent2009) What is currently missing is an integrativecomputational model that addresses the key aspects ofvisuomotor control in a coherent theoretical frameworkWe set out to develop one possible integrative model

Spatial patterns of gaze positions carry rich infor-mation on the processes of saccadic selection by thehuman visual system and this information can beanalyzed applying methods from the theory of spatialpoint processes (Illian Penttinen Stoyan amp Stoyan2008 Barthelme Trukenbrod Engbert amp Wichmann2013) Saliency maps aim at the prediction of two-dimensional (2D) densities of gaze patterns (first-orderspatial statistics) However saliency maps do notcontain the rich information about spatial interactionsinherent in experimental eye-tracking data Fixationsare interdependent Second-order statistics providequantitative tools to investigate interactions in gazepatterns Such interactions in turn may be used to gaininformation about the processes (Law et al 2009)underlying the generation of neighboring gaze posi-tions which themselves are directly related to models ofsaccadic selection

We start with analyzing the spatial statistics of gazepatterns using point process theory (Illian et al 2008)and show that gaze patterns are characterized by small-scale clustering in addition to the inhibition-of-returnmechanism (Klein 2000) that is thought to represent thedominant dynamical principle in extant attentionmodels (Itti amp Koch 2001) Next since these resultsprovide strong constraints for possible neural mecha-nisms of saccadic selection we develop a dynamicalmodel for real-time attention allocation and gazecontrol based on activation-based maps (Engbert 2012Engbert Mergenthaler Sinn amp Pikovsky 2011)Finally the model is compared against a range ofstatistical null models using methods of spatial statistics

Methods

Experiment

Stimulus material

A set of 30 randomly selected natural landscapephotographs (color) was presented to human observers

on a 20 in CRT monitor (Mitsubishi Diamond Pro2070 frame rate 120 Hz resolution 1280 middot 1024 pixelsMitsubishi Electric Corporation Tokyo Japan) Im-ages were classified into two categories natural object-based scenes (image set 1 15 images) versus imagesshowing abstract natural patterns (image set 2 15images) All images were presented centrally with grayborders extending 32 pixels to the topbottom and 40pixels to the leftright of the image since accuracy ofeye tracking systems falls off toward the monitor edges

Task and procedure

Participants were instructed to position their headson a chin rest in front of a computer screen at a viewingdistance of 70 cm Eye movements were recordedbinocularly using an Eyelink 1000 video-based eye-tracker (SR-Research OsgoodeON Canada) with asampling rate of 1000 Hz Trials began with a blackfixation cross presented on gray background at arandom location within the image boundaries Aftersuccessful fixation the fixation cross was replaced bythe image for 10 s Participants were instructed toexplore each scene for a subsequent memory testDuring the experiment we presented 30 images twiceHere we limit our analysis to the first presentation

Participants

We recorded eye movements from 35 participants(20 female 15 male) aged between 17 and 36 years(mean age 24 years) with normal or corrected-to-normal vision Participants were recruited from theUniversity of Potsdam and from a local school (32students three pupils) All participants received creditpoints or 8E for (about US $950) for participation

Data preprocessing and saccade detection

We applied a velocity-based algorithm for saccadedetection (Engbert amp Kliegl 2003 Engbert amp Mer-genthaler 2006) Saccades had a minimum amplitudeof 058 and exceeded the average velocity during a trialby six standard deviations for at least 6 ms Eye tracesbetween two successive saccades were tagged asfixations with a mean fixation position averaged acrossboth eyes Since eye position was determined by thepresentation of a fixation cross at the beginning of atrial we excluded all initial fixations from the data set(image set 1 525 image set 2 525) Furthermore weremoved fixations containing a blink or with a blinkduring an adjacent saccade (image set 1 580 image set2 588) Overall the number of fixations remaining forfurther analyses was 13349 (image set 1) and 12740(image set 2)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 2

Spatial statistics

Gaze positions can be interpreted as realizationsfrom a spatial point process (Illian et al 2008) thatcan be represented as the random set of points N frac14x1 x2 x3 (also called a point pattern) The 2Ddensity (or intensity) k of the spatial point process isgiven as the expectation or mean value of the numberof points in an observation window B ie kfrac14E(n(B))where n() is a counting measure A process isstatistically homogeneous (or stationary) if N and thetranslated set Nxfrac14 x1thorn x x2thorn x x3thorn x have thesame distribution for all x For a stationary spatialpoint process the intensity k is constant over spaceFor a nonstationary process the intensity is a functionof location kfrac14 k(x) For the computation of densitiesfrom experimental data we used kernel-densityestimates with bandwidth parameters chosen accord-ing to Scottrsquos rule (Baddeley amp Turner 2005 Scott1992) To compute deviations between 2D densitiesPkl and Qkl at grid position (k l) we used a symmetricversion of the Kullback-Leibler divergence derivedfrom information gain (Beck amp Schlogl 1993) ie

DKLD frac141

2

X

kl

PkllogPkl

QklthornQkllog

Qkl

Pkl

eth1THORN

Second-order statistics (see also the illustrated notesin the Appendix) are based on the pair density q(x1x2) which gives the probability q(x1 x2) dx1 dx2 ofobserving points in each of two disks b1 and b2 withlinear dimensions dx1 and dx2 respectively Pointpatterns can be characterized by the pair densitywhich is typically a function of the pair distance ie q(x1 x2) frac14 q(r) with r frac14 jjx1ndashx2jj for two arbitraryrealizations x1 and x2 Using a kernel-based method aestimator for the pair density can be written as

qethrTHORN frac14X6frac14

x1x2W

kethjjx1 x2jj rTHORN2prAjjx1x2jj

eth2THORN

where k() is an appropriate kernel and Ajjx1x2jjdenotes an edge correction at distance jjx1 ndash x2jj(Baddeley amp Turner 2005) For numerical computa-tions we used the Epanechnikov kernel (Illian et al2008) ie

kethxTHORN frac143

4h1 x2

h2

for h amp x amp h

0 otherwise

8gtlt

gteth3THORN

The problem of choosing the bandwidth h appro-priately is frequently discussed in the literature (Illian etal 2008) The bottom line from this discussion is thatthe behavior of the estimator should be analyzed over a

range of bandwidths We will run such an analysisbelow (Figure 2)

The pair correlation function g(r) is a normaliza-tion of the pair density with respect to first-orderintensity k so that the estimator for the paircorrelation is given by g(r) frac14 q(r) k2 Theinterpretation of the pair correlation function for agiven point pattern is straightforward For a randompattern without clustering the pair correlationfunction is g(r) rsquo 1 across the full range of distancesr If g(r) 1 then pairs of fixations are moreabundant than on average at distance r If g(r)1then pairs of fixations are less abundant than onaverage at a distance r Thus the pair correlationfunction g(r) measures how selection of a particularpoint location (ie fixation position) is influenced byother fixations at distance r

Using the inhomogeneous pair correlation functionginhom(r) we can remove the first-order inhomogeneityfrom the second-order spatial statistics ie

ginhomethrTHORN frac14X6frac14

x1x2W

1

kethx1THORNkethx2THORNkethjjx1 x2jj rTHORN

2prAjjx1x2jj

eth4THORNEstimation of ginhom(r) involves two steps First we

estimated the overall intensity k(x) for all fixationpositions obtained for a given scene In this procedurewe borrow strength from the full set of observations toobtain reliable estimates of the inhomogeneity Secondwe computed the pair correlation function from a singletrial with respect to the inhomogeneous density of thefull data set

In case of a given pair correlation function g(r) thescalar quantity (Illian et al 2008)

Dg frac14Z lsquo

0ethgethrTHORN 1THORN2dr eth5THORN

denoted as PCF deviation in the following serves as auseful test statistic that quantifies the deviations fromrandomness for a given point pattern with inhomoge-neous density k(x) The integral in Equation (5) wasevaluated numerically for pair distances r between 018and 58 (image set 1 and 2) and between 018 and 38 (LeMeur Le Callet Barba amp Thoreau [2006] data seebelow)

Results

We conducted an eye-tracking experiment on sceneviewing with 35 human observers using 15 object-basednatural scenes (image set 1) Resulting gaze data wereevaluated using first- and second-order spatial statis-tics we found that data exhibit unexpected spatial

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 3

aggregation (or clustering) We reproduced this findingfor a set of 15 abstract natural patterns (image set 2)and for an external dataset (Le Meur et al 2006) thatwas made publicly available (Bylinskii Judd DurandOliva amp Torralba 2012) Based on these results wedeveloped a dynamical model for saccadic selectionthat was evaluated by the spatial-statistics approachintroduced in this section

Spatial statistics and pair correlation function

We began by numerically computing the spatial (2D)density of gaze positions from experimental data

collected for image set 1 (Figure 1a) Fixation positionsare indicated by red dots (a total of 930 fixations from35 observers for image 2) Densities were computedusing a 2D kernel density estimator (Baddeley ampTurner 2005 R Core Team 2013 see Methods) andare visualized by gray shading in the plot Thebandwidth parameter hdensity for the kernel densityestimation was computed according to Scottrsquos rule(Scott 1992 range from 188 to 228 for h over allimages from image set 1) The obtained 2D densityk(xy) is inhomogeneous because of the dependence onposition (x y) A representative sample trajectory froma single trial is given in Figure 1d where the second andlast fixation of the scanpath is highlighted by white

Figure 1 Analysis of pair correlation functions for experimental gaze sequences (image set 1) and for computer-generated surrogate

data (a) Experimental data of gaze positions from human observers (red) and estimated intensity from kernel density estimate (gray

levels) for image 2 (b c) Realizations of gaze positions generated by inhomogeneous and homogeneous point processes

respectively (d e) Typical single-trial fixation sequences from experiment (red) and inhomogeneous point process (yellow) (f)

Kullback-Leibler divergence (KLD) indicates that the inhomogeneous point process approximates the experimental 2D density of gaze

positions (g) Pair correlation functions (PCFs) for experimental data (single trials light gray single trial from (d) black averaged over

trials red) (h) Mean PCFs for experimental data inhomogeneous and homogeneous Poisson process (i) The PCF deviation shows

that the experimental data are spatially correlated while the two surrogate datasets fail to reproduce this statistical pattern

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 4

color and by their serial numbers The first fixation wasomitted since all trials started at a random positionwithin an image determined by our experimentalprocedure (see Methods)

The pair correlation function g(r) gives a quantita-tive summary of interactions in fixation patterns bymeasuring how distance patterns between fixationsdiffer from what we would expect from independentlydistributed data (see Appendix) A value of g(r) above 1for a particular distance r indicates clustering meaningthat there are more pairs of points separated by adistance r than we would expect if fixation locationswere statistically independent

For the estimation of the pair correlation functionshort sequences of gaze positions from single experi-mental trials were considered However spatial inho-mogeneity of the 2D density was taken into accountTo obtain a reliable estimate of the spatial inhomo-geneity the 2D density was estimated from the fulldata-set (Figure 1a) of all fixations on a given imagetaken from all participants and trials It is important tonote however that in the computation of the kerneldensity estimate k(x) used for the inhomogeneous paircorrelation function Equation (4) an optimal band-width parameter h is needed to avoid two possibleartifacts First if h is very small then spatialcorrelations might be underestimated due to overfittingof the inhomogeneity of the density Second if h is toolarge then spatial correlations might be overestimatedsince first-order inhomogeneity is not adequatelyremoved from the second-order spatial statistics Wesolved this problem by computing the PCF deviationDg for the inhomogeneous point process for varyingvalues of the bandwidth h (Figure 2) Since theinhomogeneous point process generates uncorrelated

fixations ie gtheo(r) frac14 1 the optimal bandwidth forthe dataset corresponds to a minimum of the PCFdeviation Dg (quantifying the deviation from the idealvalue g(r)frac14 1) For image set 1 the optimal value wasestimated as h1 frac14 408 (Figure 2a)

Based on this density estimate we can compute theinhomogeneous pair correlation function ginhom(r) inwhich first-order inhomogeneity is removed from thesecond-order spatial correlations (see Methods) As aresult we obtained pair correlations from individualtrials (Figure 1g gray lines) Deviations from ginhom(r)rsquo 1 indicate spatial clustering at a specific distance rThe mean pair correlation function ginhom(r) providesevidence for clustering at small spatial scales with r 48 (Figure 1g red line) Such a scale is greater than thefoveal zone (r 28) and might provide an estimate ofthe size of the effective perceptual window in free sceneviewing This result is compatible with earlier findingsthat the zone of active selection of saccade targetsextends beyond the fovea into the parafovea up toeccentricities of 48 (Reinagel amp Zador 1999)

Next we carried out the same numerical computa-tions for two sets of surrogate data The surrogate datawere generated to test the null hypotheses of completespatial randomness both for an inhomogeneous pointprocess with position-dependent intensity k(x) and fora homogeneous point process with constant intensityk0 For the inhomogeneous point process we sampledfrom the estimated intensity k(x) (Figure 1b) whereasa constant intensity k0 obtained from spatial averag-ing was used for the homogeneous point process(Figure 1c) Both surrogate datasets are important forchecking the reliability of the computation of the paircorrelation function for the original data (Figure 1g)First the inhomogeneous point process gives a flat

Figure 2 Optimal bandwidth parameter for inhomogeneous pair correlation function (PCF) for three different image sets For

simulated data from the inhomogeneous point process the PCF deviation Dg Equation (5) was computed as a function of the

bandwidth h for the underlying kernel density estimate The optimal bandwidth corresponds to the position of the minimum of Dg (a)

For image set 1 h1frac14 408 (b) For image set 2 h2frac14 468 (c) For the Le Meur dataset h3frac14 228 due to a much smaller presentation

display compared with our experiments

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 5

mean correlation function with g(r) rsquo 1 (Figure 1h)which demonstrates the absence of clustering (exceptfor the divergence at very small scales as an effect ofnumerical computation issues) Thus the spatialcorrelations in the experimental data are not a simpleconsequence of spatial inhomogeneity Second theresult for the homogeneous point process (Figure 1h) isthe same as for the inhomogeneous point processwhich indicates that the correction for inhomogeneityneeded for computations in Figure 1g does notproduce unwanted artefacts due to possible overfittingof the spatial inhomogeneities We conclude that ourexperimental data give a clear indication for spatialclustering at length-scales smaller than 48 of visualangle Additionally we checked the hypothesis thatthis effect of spatial clustering might be due to saccadicundershoot and subsequent short correction saccadesby excluding all fixations with durations shorter than200 ms A related analysis of the PCF indicates noqualitative differences from the original data (Figure1h)

Results reported so far were obtained for a singlerepresentative image Over the full set of 15 images weanalyzed the 2D densities using a symmetrized form ofthe Kullback-Leibler divergence (KLD) based on theconcept of information gain (Beck amp Schlogl 1993)For the experimental data we applied a split-halfprocedure (first half of participants vs second half ofparticipants) and computed the KLD between the twoexperimental densities The corresponding KLD valuesdemonstrate that the inhomogeneous point processreproduces the 2D density (Figure 1f) while thehomogeneous point process clearly fails to approximatethe systematic inhomogeneity in the image Model typehad a significant effect on KLD v2(2)frac141054 p 001Contrasts revealed that (1) spatially inhomogeneousdata-sets were different from the homogeneous data bfrac14 0319 t(28) frac14 2285 p 001 and that (2)experimental data were significantly different frominhomogeneous surrogate data bfrac14 0062 t(28)frac14 256pfrac14 0016

Both sets of surrogate data were submitted to ananalysis of the pair correlation function where thedeviations from g(r) rsquo 1 were computed to obtain aPCF measure indicating the amount of spatial corre-lation averaged over distances (see Methods) Resultsindicate that the surrogate data producemdashas de-signedmdashuncorrelated gaze positions (low PCF devia-tion) while the experimental data by human observersexhibit spatially correlated gaze positions (Figure 1i)Model type had a significant effect on PCF v2(2)frac14984 p 001

To investigate the reliability of our finding of spatialclustering on short-length scales during natural sceneviewing we investigated two other datasets using thesame procedures with an optimal bandwidth adjusted

for each dataset (Figure 2andashc) First we compared themean pair correlation functions per image for naturalobject-based scenes (image set 1 Figure 3a d) andabstract natural patterns (image set 2 Figure 3b e)While variability of the pair correlation function mightbe greater for the abstract scenes than for the object-based scenes (Figure 3d e gray lines) the mean paircorrelation functions for the images sets (Figure 3d ered lines) are very similar This result indicates that theclustering on short length scales is a robust phenom-enon which does not seem to depend sensitively onscene content Second we analyzed a publicly availabledataset (Le Meur et al 2006) from the MIT saliencybenchmark (Bylinskii et al 2012 Judd Durand ampTorralba 2012) consisting of 40 participants whoviewed 27 color images for 15 s For these data thespatial scale of the presentation display for the images(Figure 3c) was considerably smaller than in ourexperiment Consequently we observed pair correla-tion functions that indicate clustering on smaller scales( 158 Figure 3f) Thus while scene content does notseem to exert a strong influence on spatial correlationsspatial scale of the image modulates the spatial scale ofthe clustering of fixations

We conclude that second-order spatial statisticsobtained for the experimental data are significantlydifferent from stochastic processes implementing theassumption of spatial randomness Furthermore themere presence of spatial inhomogeneity in the experi-mental data cannot explain by itself the observedspatial correlations which is evident in the results forthe inhomogeneous point process While inhibition-of-return (Klein 2000) has been discussed frequently asone of the key principles added to saliency maps forsaccadic selection (Itti amp Koch 2001) spatial clusteringof gaze positions is an additional statistical propertythat is highly informative on mechanisms of gazeplanning but has been neglected so far Next we usethese results to develop and test a dynamical model forsaccade generation that uses activation field dynamicsto reproduce spatial statistics of first- and second-order

A dynamical model of saccade generation

A key assumption for the model we propose is thecombination of two neural activation maps toimplement dynamical principles for saccadic selectionFirst a fixation map f(x y t) is keeping track of thesequence of fixations by inhibitory tagging (Itti ampKoch 2001) Second an attention map a(x y t) thatis driven by early visual processing controls thedistribution of attention Physiologically the as-sumption of the dynamical maps is supported by thepresence of an allocentric motor map of visual spacein the primate entorhinal cortex (Killian Jutras amp

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 6

Buffalo 2012) Moreover this map is spatiallydiscrete (Stensola et al 2012) and serves as abiological motivation for the fixation and attentionmaps in our model

We implemented activation maps for attention andfixation (inhibitory tagging) on a discrete square latticeof dimension L middot L Lattice points (i j) haveequidistant spatial positions (xi yj) for i j frac14 1 Lwhere xifrac14x0thorn iDx and yjfrac14y0thorn jDy As a consequenceattention and fixation maps are implemented inspatially discrete forms aij (t) and fij(t) respec-tively For the numerical simulations time wasdiscretized in steps of Dtfrac14 10 ms with tfrac14 k middot Dt and kfrac14 0 1 2 T

If the observerrsquos gaze is at position (xg yg) at time tthen a position-dependent activation change Fij(xg yg)and a global decay proportional to the currentactivation ndashxfij(t) are added to all lattice positions toupdate the activation map at time t thorn 1 ie

fijethtthorn 1THORN frac14 Fijethxg ygTHORN thorn eth1 xTHORNfijethtTHORN eth6THORN

where the activation change Fij(xg yg) rdquo Fij(t) isimplicitly time-dependent because of the time-depen-dence of gaze positions xg(t) yg(t) The constant x 1determines the strength of the decay of activation Forthe spatial distribution of the activation change Fij(t)

we assume a Gaussian profile ie

FijethtTHORN frac14R0ffiffiffiffiffiffi2pp

r0

exp ethxi xgethtTHORNTHORN2 thorn ethyj ygethtTHORNTHORN2

2r20

eth7THORNwith the free parameters r0 and R0 controlling thespatial extent of the activation change and the strengthof the activation change respectively In our model thebuild-up of activation in the fixation map is amechanism of inhibitory tagging (Itti amp Koch 2001) toreduce the amount of refixations on recently visitedimage patches

For the attention map aij(t) we assume similardynamics however the width of Gaussian activationchange Aij(t) is assumed to be proportional to the staticsaliency map ij The updating rule for the attentionmap is given by

aijethtthorn 1THORN frac14ijAijethtTHORNX

kl

klAklethtTHORNthorn eth1 qTHORNaijethtTHORN eth8THORN

with decay constant q 1 As a result the saliencymap ij is accessed locally through a Gaussian aperturewith size r1 and scale parameter R1 similar to Equation

Figure 3 Comparison of the mean pair correlation functions for three different image sets (a) Natural object-based scenes (b)

Abstract natural patterns (c) Natural scenes from the Le Meur et al dataset Mean pair correlation function are similar for object-

based (d) and abstract (e) natural scenes The presentation of images on a smaller display in the Le Meur dataset compared to our

experiments (f) resulted in a smaller length scale of spatial clustering

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 7

(7) Using the local read-out mechanism information isprovided for the attention map to identify regions ofinterest for eye guidance

The fixation map monitors recently visited fixationlocations by increasing local activation at the corre-sponding lattice points (Engbert et al 2011 Freund ampGrassberger 1992) If the observerrsquos gaze position is ata position corresponding to lattice position (i j) then aposition-dependent activation change Fij in the form ofa Gaussian profile is added locally in each time stepwhile a global decay proportional to the currentactivation is applied to all lattice positions The widthof the Gaussian activation r0 and the decay x are thetwo free parameters controlling activation in thefixation map For the attention map aij(t) we assumesimilar dynamics including local increase of activationwith size r1 and global decay q However the amountof activation change Aij is assumed to be proportionalto the time-independent saliency map ij so that thelocal increase of activation isthorn ijAij

PklAkl

Our modeling assumptions are related to specifichypotheses on model parameters We expect that thesize of the Gaussian profile for the attention map islarger than the corresponding size of the fixation mapr1 r0 since attention is the process driving eyemovements into new regions of visual space while theinhibitory tagging process should be more localized Asimilar expectation can be formulated on the decayconstants Since inhibitory tagging is needed on alonger time scale as a foraging facilitator we expect aslower decay in the fixation map compared to theattention map ie x q

Next we assume that given a saccade command attime t both maps are evaluated to select the nextsaccade target First we apply a normalization of bothattention and fixation maps as a general neuralprinciple to obtain relative activations (Carandini ampHeeger 2011) Second we introduce a potentialfunction as the difference of the normalized maps

uijethtTHORN frac14 aijethtTHORN$ kX

kl

aklethtTHORNfrac12 )kthorn

fijethtTHORN$ cX

kl

fklethtTHORNfrac12 )c eth9THORN

where the exponents k and c are free parametersHowever a value of kfrac14 1 is a necessary boundarycondition to obtain a model that accurately reproducesthe densities of gaze positions In a qualitative analysisof the model (see Appendix) pilot simulations showedthat c is an important control parameter determiningspatial correlations where c rsquo 03 was used toreproduce spatial correlations observed in our experi-mental data

The potential uij(t) Equation (9) can be positive ornegative at position (i j) Lattice positions with apositive potential uij 0 are excluded from saccadic

selection since corresponding regions were visitedrecently with high probability Among the latticepositions with negative activations we implementedstochastic selection of saccade targets proportional torelative activations also known as Lucersquos choice rule(Luce 1959) We implemented this form of stochasticselection from the set S frac14 fethi jTHORNjuij 0g where theprobability pij(t) to select lattice position (i j) at time tas the next saccade target is given by

pijethtTHORN frac14 maxuijethtTHORNX

ethklTHORNSuklethtTHORN

g

0

BB

1

CCA eth10THORN

The noise term g is an additional parametercontrolling the amount of noise in target selection

Numerical simulations of the model

Our computational modeling approach to saccadicselection has been developed to propose a minimalmodel that captures the types of spatial statisticsobserved in experimental data After fixing modelparameters all that is needed to run the model on aparticular image is a 2D density estimate of gazepatterns (from experimental data) or a corresponding2D prediction from one of the available saliency models(Borji amp Itti 2013) To reduce computational com-plexity in the current study and to exclude potentialmismatches between data and model simulations due tothe saliency models we run the model on experimen-tally realized densities of gaze positions which isequivalent to assuming an exact saliency modelHowever our modeling approach is compatible withfuture dynamical saliency models that provide time-and position-dependent saliency during a sequence ofgaze shifts thus our model introduces a generaldynamical framework and is not tied to using empiricaldata

The numerical values of the five model parameterswere estimated from experimental data recorded for thefirst five images of natural object-based scenes (imageset 1) using a genetic algorithm approach (Mitchell1998 see Appendix Table 1) The remaining 10 imagesof image set 1 were used for model evaluationsmdashas the15 images of image set 2 The objective function forparameter estimation was based on evaluation of first-order statistics (2D density of gaze positions) and thedistribution of saccade lengths In agreement with ourfirst expectation the estimated optimal values for thespatial extent of the inhibitory tagging process in thefixation map r0frac14 228 is considerably smaller than thecorresponding size of the build-up function for theattention map r1frac14 498 Our second expectation wasrelated to the decay constants which turned out to be

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 8

larger for the attention map qfrac14 0066 than for thefixation map xfrac14 93 middot 105 so q was greater than xagain as expected Finally the noise level in the targetmap is g frac14 91 middot 105

An example for the simulation of the modeldemonstrates the interplay between inhibitory pro-cesses from the fixation map and the attention mapduring gaze planning (Figure 4) The fixation mapbuilds up activation at fixated lattice positions (yellowto red) while the attention map identifies new regionsof interest for saccadic selection (blue) These simula-tions show on a qualitative level how the modelimplements the interplay of the assumed mechanisms of

inhibitory tagging and saccadic selection of gazepositions (see Supplementary Video)

To investigate model performance qualitatively weran simulations for one image (image 6 930 fixationFigure 5a) and obtained a number of fixations similarto the experimental data (882 fixations Figure 5b)Single-trial scanpaths from experiments and simula-tions are shown additionally in Figure 5a and 5brespectively) The resulting distributions of saccadelengths indicate that our dynamical model is in goodagreement with experimental data while the twosurrogate datasets (homogeneous and inhomogeneouspoint processes) fail to reproduce the distribution(Figure 5c) An analysis of the pair correlation

Figure 4 Illustration of a simulated sequence of gaze positions and the activation dynamics of the model (a) The density of gazepositions (empirical saliency) is used as a proxy for a computed saliency map that drives activation in the attention map (bndashf)Sequence of snapshots of the potential (bluefrac14 low yellowfrac14 high) Note that blue color indicates density of fixations in (a) while itrefers to low values of the potential in (b)ndash(f)

Parameter Symbol Mean Error Min Max Reference

Fixation mapActivation span [8] r0 216 011 03 100 Eq (7)Decay log10 x ndash403 028 ndash50 ndash10 Eq (6)

Attention mapActivation span [8] r1 488 025 03 100 Eq (8)Decay log10 q ndash118 008 ndash30 ndash10 Eq (8)

Target selectionAdditive noise log10 g ndash404 007 ndash90 ndash30 Eq (10)

Table 1 Model parameters

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 9

functions indicates that the spatial correlations presentin the experimental data were approximated by thedynamical model (Figure 5d) however the twosurrogate datasets representing uncorrelated sequencesby construction produce qualitatively different spatialcorrelations

To investigate the influences of saccade-lengthdistributions on pair correlations we constructedanother statistical control model that had access to theimage-specific saccade-length distribution It is impor-tant to note that this model was not introduced as acompetitor to the dynamical model which is able topredict saccade-length distributions The statisticalcontrol model approximated the distribution of saccadelengths l and 2D densities of gaze positions x bysampling from the joint probability distribution p(x l)under the assumption of statistical independence ofsaccade lengths and gaze positions ie p(x l)frac14p(x)p(l) This model by construction approximates thedistribution of saccade lengths and 2D density of gazepositions (Figure 5c) The simulations indicate how-

ever that even the combination of inhomogeneousdensity of gaze positions and non-normal distributionof saccade lengths used by the statistical control modelcannot explain spatial correlations in the experimentaldata characterized by the pair correlations function(Figure 5d)

For the statistical analysis of model performance onnew images we carried out additional numericalsimulations We fitted model parameters to dataobtained for the first five images only (see above) andpredicted data for the remaining 10 images by the newsimulations to isolate parameter estimation from modelevaluation (calculating test errors rather than trainingerrors)

Our simulations show that the dynamical modelpredicted the 2D density of gaze positions accurately(Figure 6a) The obtained KLD values for the model(blue) were comparable to KLD values calculated bythe split-half procedure for the experimental data (red)and to the KLD values obtained for the statisticalcontrol models (green frac14Homogeneous point process

Figure 5 Distribution of saccade lengths and pair correlation functions from model simulations (image 6) (a) Experimental

distribution of gaze positions (red dots) and a representative sample trial (red lines) (b) Corresponding plot of simulated data

obtained from our dynamical model (blue dots) A single-trial simulation is highlighted (blue line) (c) Distributions of saccade lengths

for experimental data (red) dynamical model (blue) homogeneous point process (green) inhomogeneous point process (yellow) and

a statistical control model (magenta) (d) Pair correlation functions for the different models The dynamical model (blue line) produces

spatial correlations similar to the experimental data (red line)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 10

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 2: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

observers Moreover part of the mismatch betweencomputer-generated saliency maps and actual gazepatterns might be explained by properties of thevisuomotor system (Findlay ampWalker 1999) Recentlya number of publications addressed specific aspects ofthis problem eg different roles for short and longsaccades (Tatler Baddeley amp Vincent 2006) or returnsaccades (Ludwig Farrell Ellis amp Gilchrist 2009Wilming Harst Schmidt amp Konig 2013) Moreoverbehavioral biases might produce an important contri-bution to eye-movement statistics (Tatler amp Vincent2009) What is currently missing is an integrativecomputational model that addresses the key aspects ofvisuomotor control in a coherent theoretical frameworkWe set out to develop one possible integrative model

Spatial patterns of gaze positions carry rich infor-mation on the processes of saccadic selection by thehuman visual system and this information can beanalyzed applying methods from the theory of spatialpoint processes (Illian Penttinen Stoyan amp Stoyan2008 Barthelme Trukenbrod Engbert amp Wichmann2013) Saliency maps aim at the prediction of two-dimensional (2D) densities of gaze patterns (first-orderspatial statistics) However saliency maps do notcontain the rich information about spatial interactionsinherent in experimental eye-tracking data Fixationsare interdependent Second-order statistics providequantitative tools to investigate interactions in gazepatterns Such interactions in turn may be used to gaininformation about the processes (Law et al 2009)underlying the generation of neighboring gaze posi-tions which themselves are directly related to models ofsaccadic selection

We start with analyzing the spatial statistics of gazepatterns using point process theory (Illian et al 2008)and show that gaze patterns are characterized by small-scale clustering in addition to the inhibition-of-returnmechanism (Klein 2000) that is thought to represent thedominant dynamical principle in extant attentionmodels (Itti amp Koch 2001) Next since these resultsprovide strong constraints for possible neural mecha-nisms of saccadic selection we develop a dynamicalmodel for real-time attention allocation and gazecontrol based on activation-based maps (Engbert 2012Engbert Mergenthaler Sinn amp Pikovsky 2011)Finally the model is compared against a range ofstatistical null models using methods of spatial statistics

Methods

Experiment

Stimulus material

A set of 30 randomly selected natural landscapephotographs (color) was presented to human observers

on a 20 in CRT monitor (Mitsubishi Diamond Pro2070 frame rate 120 Hz resolution 1280 middot 1024 pixelsMitsubishi Electric Corporation Tokyo Japan) Im-ages were classified into two categories natural object-based scenes (image set 1 15 images) versus imagesshowing abstract natural patterns (image set 2 15images) All images were presented centrally with grayborders extending 32 pixels to the topbottom and 40pixels to the leftright of the image since accuracy ofeye tracking systems falls off toward the monitor edges

Task and procedure

Participants were instructed to position their headson a chin rest in front of a computer screen at a viewingdistance of 70 cm Eye movements were recordedbinocularly using an Eyelink 1000 video-based eye-tracker (SR-Research OsgoodeON Canada) with asampling rate of 1000 Hz Trials began with a blackfixation cross presented on gray background at arandom location within the image boundaries Aftersuccessful fixation the fixation cross was replaced bythe image for 10 s Participants were instructed toexplore each scene for a subsequent memory testDuring the experiment we presented 30 images twiceHere we limit our analysis to the first presentation

Participants

We recorded eye movements from 35 participants(20 female 15 male) aged between 17 and 36 years(mean age 24 years) with normal or corrected-to-normal vision Participants were recruited from theUniversity of Potsdam and from a local school (32students three pupils) All participants received creditpoints or 8E for (about US $950) for participation

Data preprocessing and saccade detection

We applied a velocity-based algorithm for saccadedetection (Engbert amp Kliegl 2003 Engbert amp Mer-genthaler 2006) Saccades had a minimum amplitudeof 058 and exceeded the average velocity during a trialby six standard deviations for at least 6 ms Eye tracesbetween two successive saccades were tagged asfixations with a mean fixation position averaged acrossboth eyes Since eye position was determined by thepresentation of a fixation cross at the beginning of atrial we excluded all initial fixations from the data set(image set 1 525 image set 2 525) Furthermore weremoved fixations containing a blink or with a blinkduring an adjacent saccade (image set 1 580 image set2 588) Overall the number of fixations remaining forfurther analyses was 13349 (image set 1) and 12740(image set 2)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 2

Spatial statistics

Gaze positions can be interpreted as realizationsfrom a spatial point process (Illian et al 2008) thatcan be represented as the random set of points N frac14x1 x2 x3 (also called a point pattern) The 2Ddensity (or intensity) k of the spatial point process isgiven as the expectation or mean value of the numberof points in an observation window B ie kfrac14E(n(B))where n() is a counting measure A process isstatistically homogeneous (or stationary) if N and thetranslated set Nxfrac14 x1thorn x x2thorn x x3thorn x have thesame distribution for all x For a stationary spatialpoint process the intensity k is constant over spaceFor a nonstationary process the intensity is a functionof location kfrac14 k(x) For the computation of densitiesfrom experimental data we used kernel-densityestimates with bandwidth parameters chosen accord-ing to Scottrsquos rule (Baddeley amp Turner 2005 Scott1992) To compute deviations between 2D densitiesPkl and Qkl at grid position (k l) we used a symmetricversion of the Kullback-Leibler divergence derivedfrom information gain (Beck amp Schlogl 1993) ie

DKLD frac141

2

X

kl

PkllogPkl

QklthornQkllog

Qkl

Pkl

eth1THORN

Second-order statistics (see also the illustrated notesin the Appendix) are based on the pair density q(x1x2) which gives the probability q(x1 x2) dx1 dx2 ofobserving points in each of two disks b1 and b2 withlinear dimensions dx1 and dx2 respectively Pointpatterns can be characterized by the pair densitywhich is typically a function of the pair distance ie q(x1 x2) frac14 q(r) with r frac14 jjx1ndashx2jj for two arbitraryrealizations x1 and x2 Using a kernel-based method aestimator for the pair density can be written as

qethrTHORN frac14X6frac14

x1x2W

kethjjx1 x2jj rTHORN2prAjjx1x2jj

eth2THORN

where k() is an appropriate kernel and Ajjx1x2jjdenotes an edge correction at distance jjx1 ndash x2jj(Baddeley amp Turner 2005) For numerical computa-tions we used the Epanechnikov kernel (Illian et al2008) ie

kethxTHORN frac143

4h1 x2

h2

for h amp x amp h

0 otherwise

8gtlt

gteth3THORN

The problem of choosing the bandwidth h appro-priately is frequently discussed in the literature (Illian etal 2008) The bottom line from this discussion is thatthe behavior of the estimator should be analyzed over a

range of bandwidths We will run such an analysisbelow (Figure 2)

The pair correlation function g(r) is a normaliza-tion of the pair density with respect to first-orderintensity k so that the estimator for the paircorrelation is given by g(r) frac14 q(r) k2 Theinterpretation of the pair correlation function for agiven point pattern is straightforward For a randompattern without clustering the pair correlationfunction is g(r) rsquo 1 across the full range of distancesr If g(r) 1 then pairs of fixations are moreabundant than on average at distance r If g(r)1then pairs of fixations are less abundant than onaverage at a distance r Thus the pair correlationfunction g(r) measures how selection of a particularpoint location (ie fixation position) is influenced byother fixations at distance r

Using the inhomogeneous pair correlation functionginhom(r) we can remove the first-order inhomogeneityfrom the second-order spatial statistics ie

ginhomethrTHORN frac14X6frac14

x1x2W

1

kethx1THORNkethx2THORNkethjjx1 x2jj rTHORN

2prAjjx1x2jj

eth4THORNEstimation of ginhom(r) involves two steps First we

estimated the overall intensity k(x) for all fixationpositions obtained for a given scene In this procedurewe borrow strength from the full set of observations toobtain reliable estimates of the inhomogeneity Secondwe computed the pair correlation function from a singletrial with respect to the inhomogeneous density of thefull data set

In case of a given pair correlation function g(r) thescalar quantity (Illian et al 2008)

Dg frac14Z lsquo

0ethgethrTHORN 1THORN2dr eth5THORN

denoted as PCF deviation in the following serves as auseful test statistic that quantifies the deviations fromrandomness for a given point pattern with inhomoge-neous density k(x) The integral in Equation (5) wasevaluated numerically for pair distances r between 018and 58 (image set 1 and 2) and between 018 and 38 (LeMeur Le Callet Barba amp Thoreau [2006] data seebelow)

Results

We conducted an eye-tracking experiment on sceneviewing with 35 human observers using 15 object-basednatural scenes (image set 1) Resulting gaze data wereevaluated using first- and second-order spatial statis-tics we found that data exhibit unexpected spatial

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 3

aggregation (or clustering) We reproduced this findingfor a set of 15 abstract natural patterns (image set 2)and for an external dataset (Le Meur et al 2006) thatwas made publicly available (Bylinskii Judd DurandOliva amp Torralba 2012) Based on these results wedeveloped a dynamical model for saccadic selectionthat was evaluated by the spatial-statistics approachintroduced in this section

Spatial statistics and pair correlation function

We began by numerically computing the spatial (2D)density of gaze positions from experimental data

collected for image set 1 (Figure 1a) Fixation positionsare indicated by red dots (a total of 930 fixations from35 observers for image 2) Densities were computedusing a 2D kernel density estimator (Baddeley ampTurner 2005 R Core Team 2013 see Methods) andare visualized by gray shading in the plot Thebandwidth parameter hdensity for the kernel densityestimation was computed according to Scottrsquos rule(Scott 1992 range from 188 to 228 for h over allimages from image set 1) The obtained 2D densityk(xy) is inhomogeneous because of the dependence onposition (x y) A representative sample trajectory froma single trial is given in Figure 1d where the second andlast fixation of the scanpath is highlighted by white

Figure 1 Analysis of pair correlation functions for experimental gaze sequences (image set 1) and for computer-generated surrogate

data (a) Experimental data of gaze positions from human observers (red) and estimated intensity from kernel density estimate (gray

levels) for image 2 (b c) Realizations of gaze positions generated by inhomogeneous and homogeneous point processes

respectively (d e) Typical single-trial fixation sequences from experiment (red) and inhomogeneous point process (yellow) (f)

Kullback-Leibler divergence (KLD) indicates that the inhomogeneous point process approximates the experimental 2D density of gaze

positions (g) Pair correlation functions (PCFs) for experimental data (single trials light gray single trial from (d) black averaged over

trials red) (h) Mean PCFs for experimental data inhomogeneous and homogeneous Poisson process (i) The PCF deviation shows

that the experimental data are spatially correlated while the two surrogate datasets fail to reproduce this statistical pattern

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 4

color and by their serial numbers The first fixation wasomitted since all trials started at a random positionwithin an image determined by our experimentalprocedure (see Methods)

The pair correlation function g(r) gives a quantita-tive summary of interactions in fixation patterns bymeasuring how distance patterns between fixationsdiffer from what we would expect from independentlydistributed data (see Appendix) A value of g(r) above 1for a particular distance r indicates clustering meaningthat there are more pairs of points separated by adistance r than we would expect if fixation locationswere statistically independent

For the estimation of the pair correlation functionshort sequences of gaze positions from single experi-mental trials were considered However spatial inho-mogeneity of the 2D density was taken into accountTo obtain a reliable estimate of the spatial inhomo-geneity the 2D density was estimated from the fulldata-set (Figure 1a) of all fixations on a given imagetaken from all participants and trials It is important tonote however that in the computation of the kerneldensity estimate k(x) used for the inhomogeneous paircorrelation function Equation (4) an optimal band-width parameter h is needed to avoid two possibleartifacts First if h is very small then spatialcorrelations might be underestimated due to overfittingof the inhomogeneity of the density Second if h is toolarge then spatial correlations might be overestimatedsince first-order inhomogeneity is not adequatelyremoved from the second-order spatial statistics Wesolved this problem by computing the PCF deviationDg for the inhomogeneous point process for varyingvalues of the bandwidth h (Figure 2) Since theinhomogeneous point process generates uncorrelated

fixations ie gtheo(r) frac14 1 the optimal bandwidth forthe dataset corresponds to a minimum of the PCFdeviation Dg (quantifying the deviation from the idealvalue g(r)frac14 1) For image set 1 the optimal value wasestimated as h1 frac14 408 (Figure 2a)

Based on this density estimate we can compute theinhomogeneous pair correlation function ginhom(r) inwhich first-order inhomogeneity is removed from thesecond-order spatial correlations (see Methods) As aresult we obtained pair correlations from individualtrials (Figure 1g gray lines) Deviations from ginhom(r)rsquo 1 indicate spatial clustering at a specific distance rThe mean pair correlation function ginhom(r) providesevidence for clustering at small spatial scales with r 48 (Figure 1g red line) Such a scale is greater than thefoveal zone (r 28) and might provide an estimate ofthe size of the effective perceptual window in free sceneviewing This result is compatible with earlier findingsthat the zone of active selection of saccade targetsextends beyond the fovea into the parafovea up toeccentricities of 48 (Reinagel amp Zador 1999)

Next we carried out the same numerical computa-tions for two sets of surrogate data The surrogate datawere generated to test the null hypotheses of completespatial randomness both for an inhomogeneous pointprocess with position-dependent intensity k(x) and fora homogeneous point process with constant intensityk0 For the inhomogeneous point process we sampledfrom the estimated intensity k(x) (Figure 1b) whereasa constant intensity k0 obtained from spatial averag-ing was used for the homogeneous point process(Figure 1c) Both surrogate datasets are important forchecking the reliability of the computation of the paircorrelation function for the original data (Figure 1g)First the inhomogeneous point process gives a flat

Figure 2 Optimal bandwidth parameter for inhomogeneous pair correlation function (PCF) for three different image sets For

simulated data from the inhomogeneous point process the PCF deviation Dg Equation (5) was computed as a function of the

bandwidth h for the underlying kernel density estimate The optimal bandwidth corresponds to the position of the minimum of Dg (a)

For image set 1 h1frac14 408 (b) For image set 2 h2frac14 468 (c) For the Le Meur dataset h3frac14 228 due to a much smaller presentation

display compared with our experiments

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 5

mean correlation function with g(r) rsquo 1 (Figure 1h)which demonstrates the absence of clustering (exceptfor the divergence at very small scales as an effect ofnumerical computation issues) Thus the spatialcorrelations in the experimental data are not a simpleconsequence of spatial inhomogeneity Second theresult for the homogeneous point process (Figure 1h) isthe same as for the inhomogeneous point processwhich indicates that the correction for inhomogeneityneeded for computations in Figure 1g does notproduce unwanted artefacts due to possible overfittingof the spatial inhomogeneities We conclude that ourexperimental data give a clear indication for spatialclustering at length-scales smaller than 48 of visualangle Additionally we checked the hypothesis thatthis effect of spatial clustering might be due to saccadicundershoot and subsequent short correction saccadesby excluding all fixations with durations shorter than200 ms A related analysis of the PCF indicates noqualitative differences from the original data (Figure1h)

Results reported so far were obtained for a singlerepresentative image Over the full set of 15 images weanalyzed the 2D densities using a symmetrized form ofthe Kullback-Leibler divergence (KLD) based on theconcept of information gain (Beck amp Schlogl 1993)For the experimental data we applied a split-halfprocedure (first half of participants vs second half ofparticipants) and computed the KLD between the twoexperimental densities The corresponding KLD valuesdemonstrate that the inhomogeneous point processreproduces the 2D density (Figure 1f) while thehomogeneous point process clearly fails to approximatethe systematic inhomogeneity in the image Model typehad a significant effect on KLD v2(2)frac141054 p 001Contrasts revealed that (1) spatially inhomogeneousdata-sets were different from the homogeneous data bfrac14 0319 t(28) frac14 2285 p 001 and that (2)experimental data were significantly different frominhomogeneous surrogate data bfrac14 0062 t(28)frac14 256pfrac14 0016

Both sets of surrogate data were submitted to ananalysis of the pair correlation function where thedeviations from g(r) rsquo 1 were computed to obtain aPCF measure indicating the amount of spatial corre-lation averaged over distances (see Methods) Resultsindicate that the surrogate data producemdashas de-signedmdashuncorrelated gaze positions (low PCF devia-tion) while the experimental data by human observersexhibit spatially correlated gaze positions (Figure 1i)Model type had a significant effect on PCF v2(2)frac14984 p 001

To investigate the reliability of our finding of spatialclustering on short-length scales during natural sceneviewing we investigated two other datasets using thesame procedures with an optimal bandwidth adjusted

for each dataset (Figure 2andashc) First we compared themean pair correlation functions per image for naturalobject-based scenes (image set 1 Figure 3a d) andabstract natural patterns (image set 2 Figure 3b e)While variability of the pair correlation function mightbe greater for the abstract scenes than for the object-based scenes (Figure 3d e gray lines) the mean paircorrelation functions for the images sets (Figure 3d ered lines) are very similar This result indicates that theclustering on short length scales is a robust phenom-enon which does not seem to depend sensitively onscene content Second we analyzed a publicly availabledataset (Le Meur et al 2006) from the MIT saliencybenchmark (Bylinskii et al 2012 Judd Durand ampTorralba 2012) consisting of 40 participants whoviewed 27 color images for 15 s For these data thespatial scale of the presentation display for the images(Figure 3c) was considerably smaller than in ourexperiment Consequently we observed pair correla-tion functions that indicate clustering on smaller scales( 158 Figure 3f) Thus while scene content does notseem to exert a strong influence on spatial correlationsspatial scale of the image modulates the spatial scale ofthe clustering of fixations

We conclude that second-order spatial statisticsobtained for the experimental data are significantlydifferent from stochastic processes implementing theassumption of spatial randomness Furthermore themere presence of spatial inhomogeneity in the experi-mental data cannot explain by itself the observedspatial correlations which is evident in the results forthe inhomogeneous point process While inhibition-of-return (Klein 2000) has been discussed frequently asone of the key principles added to saliency maps forsaccadic selection (Itti amp Koch 2001) spatial clusteringof gaze positions is an additional statistical propertythat is highly informative on mechanisms of gazeplanning but has been neglected so far Next we usethese results to develop and test a dynamical model forsaccade generation that uses activation field dynamicsto reproduce spatial statistics of first- and second-order

A dynamical model of saccade generation

A key assumption for the model we propose is thecombination of two neural activation maps toimplement dynamical principles for saccadic selectionFirst a fixation map f(x y t) is keeping track of thesequence of fixations by inhibitory tagging (Itti ampKoch 2001) Second an attention map a(x y t) thatis driven by early visual processing controls thedistribution of attention Physiologically the as-sumption of the dynamical maps is supported by thepresence of an allocentric motor map of visual spacein the primate entorhinal cortex (Killian Jutras amp

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 6

Buffalo 2012) Moreover this map is spatiallydiscrete (Stensola et al 2012) and serves as abiological motivation for the fixation and attentionmaps in our model

We implemented activation maps for attention andfixation (inhibitory tagging) on a discrete square latticeof dimension L middot L Lattice points (i j) haveequidistant spatial positions (xi yj) for i j frac14 1 Lwhere xifrac14x0thorn iDx and yjfrac14y0thorn jDy As a consequenceattention and fixation maps are implemented inspatially discrete forms aij (t) and fij(t) respec-tively For the numerical simulations time wasdiscretized in steps of Dtfrac14 10 ms with tfrac14 k middot Dt and kfrac14 0 1 2 T

If the observerrsquos gaze is at position (xg yg) at time tthen a position-dependent activation change Fij(xg yg)and a global decay proportional to the currentactivation ndashxfij(t) are added to all lattice positions toupdate the activation map at time t thorn 1 ie

fijethtthorn 1THORN frac14 Fijethxg ygTHORN thorn eth1 xTHORNfijethtTHORN eth6THORN

where the activation change Fij(xg yg) rdquo Fij(t) isimplicitly time-dependent because of the time-depen-dence of gaze positions xg(t) yg(t) The constant x 1determines the strength of the decay of activation Forthe spatial distribution of the activation change Fij(t)

we assume a Gaussian profile ie

FijethtTHORN frac14R0ffiffiffiffiffiffi2pp

r0

exp ethxi xgethtTHORNTHORN2 thorn ethyj ygethtTHORNTHORN2

2r20

eth7THORNwith the free parameters r0 and R0 controlling thespatial extent of the activation change and the strengthof the activation change respectively In our model thebuild-up of activation in the fixation map is amechanism of inhibitory tagging (Itti amp Koch 2001) toreduce the amount of refixations on recently visitedimage patches

For the attention map aij(t) we assume similardynamics however the width of Gaussian activationchange Aij(t) is assumed to be proportional to the staticsaliency map ij The updating rule for the attentionmap is given by

aijethtthorn 1THORN frac14ijAijethtTHORNX

kl

klAklethtTHORNthorn eth1 qTHORNaijethtTHORN eth8THORN

with decay constant q 1 As a result the saliencymap ij is accessed locally through a Gaussian aperturewith size r1 and scale parameter R1 similar to Equation

Figure 3 Comparison of the mean pair correlation functions for three different image sets (a) Natural object-based scenes (b)

Abstract natural patterns (c) Natural scenes from the Le Meur et al dataset Mean pair correlation function are similar for object-

based (d) and abstract (e) natural scenes The presentation of images on a smaller display in the Le Meur dataset compared to our

experiments (f) resulted in a smaller length scale of spatial clustering

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 7

(7) Using the local read-out mechanism information isprovided for the attention map to identify regions ofinterest for eye guidance

The fixation map monitors recently visited fixationlocations by increasing local activation at the corre-sponding lattice points (Engbert et al 2011 Freund ampGrassberger 1992) If the observerrsquos gaze position is ata position corresponding to lattice position (i j) then aposition-dependent activation change Fij in the form ofa Gaussian profile is added locally in each time stepwhile a global decay proportional to the currentactivation is applied to all lattice positions The widthof the Gaussian activation r0 and the decay x are thetwo free parameters controlling activation in thefixation map For the attention map aij(t) we assumesimilar dynamics including local increase of activationwith size r1 and global decay q However the amountof activation change Aij is assumed to be proportionalto the time-independent saliency map ij so that thelocal increase of activation isthorn ijAij

PklAkl

Our modeling assumptions are related to specifichypotheses on model parameters We expect that thesize of the Gaussian profile for the attention map islarger than the corresponding size of the fixation mapr1 r0 since attention is the process driving eyemovements into new regions of visual space while theinhibitory tagging process should be more localized Asimilar expectation can be formulated on the decayconstants Since inhibitory tagging is needed on alonger time scale as a foraging facilitator we expect aslower decay in the fixation map compared to theattention map ie x q

Next we assume that given a saccade command attime t both maps are evaluated to select the nextsaccade target First we apply a normalization of bothattention and fixation maps as a general neuralprinciple to obtain relative activations (Carandini ampHeeger 2011) Second we introduce a potentialfunction as the difference of the normalized maps

uijethtTHORN frac14 aijethtTHORN$ kX

kl

aklethtTHORNfrac12 )kthorn

fijethtTHORN$ cX

kl

fklethtTHORNfrac12 )c eth9THORN

where the exponents k and c are free parametersHowever a value of kfrac14 1 is a necessary boundarycondition to obtain a model that accurately reproducesthe densities of gaze positions In a qualitative analysisof the model (see Appendix) pilot simulations showedthat c is an important control parameter determiningspatial correlations where c rsquo 03 was used toreproduce spatial correlations observed in our experi-mental data

The potential uij(t) Equation (9) can be positive ornegative at position (i j) Lattice positions with apositive potential uij 0 are excluded from saccadic

selection since corresponding regions were visitedrecently with high probability Among the latticepositions with negative activations we implementedstochastic selection of saccade targets proportional torelative activations also known as Lucersquos choice rule(Luce 1959) We implemented this form of stochasticselection from the set S frac14 fethi jTHORNjuij 0g where theprobability pij(t) to select lattice position (i j) at time tas the next saccade target is given by

pijethtTHORN frac14 maxuijethtTHORNX

ethklTHORNSuklethtTHORN

g

0

BB

1

CCA eth10THORN

The noise term g is an additional parametercontrolling the amount of noise in target selection

Numerical simulations of the model

Our computational modeling approach to saccadicselection has been developed to propose a minimalmodel that captures the types of spatial statisticsobserved in experimental data After fixing modelparameters all that is needed to run the model on aparticular image is a 2D density estimate of gazepatterns (from experimental data) or a corresponding2D prediction from one of the available saliency models(Borji amp Itti 2013) To reduce computational com-plexity in the current study and to exclude potentialmismatches between data and model simulations due tothe saliency models we run the model on experimen-tally realized densities of gaze positions which isequivalent to assuming an exact saliency modelHowever our modeling approach is compatible withfuture dynamical saliency models that provide time-and position-dependent saliency during a sequence ofgaze shifts thus our model introduces a generaldynamical framework and is not tied to using empiricaldata

The numerical values of the five model parameterswere estimated from experimental data recorded for thefirst five images of natural object-based scenes (imageset 1) using a genetic algorithm approach (Mitchell1998 see Appendix Table 1) The remaining 10 imagesof image set 1 were used for model evaluationsmdashas the15 images of image set 2 The objective function forparameter estimation was based on evaluation of first-order statistics (2D density of gaze positions) and thedistribution of saccade lengths In agreement with ourfirst expectation the estimated optimal values for thespatial extent of the inhibitory tagging process in thefixation map r0frac14 228 is considerably smaller than thecorresponding size of the build-up function for theattention map r1frac14 498 Our second expectation wasrelated to the decay constants which turned out to be

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 8

larger for the attention map qfrac14 0066 than for thefixation map xfrac14 93 middot 105 so q was greater than xagain as expected Finally the noise level in the targetmap is g frac14 91 middot 105

An example for the simulation of the modeldemonstrates the interplay between inhibitory pro-cesses from the fixation map and the attention mapduring gaze planning (Figure 4) The fixation mapbuilds up activation at fixated lattice positions (yellowto red) while the attention map identifies new regionsof interest for saccadic selection (blue) These simula-tions show on a qualitative level how the modelimplements the interplay of the assumed mechanisms of

inhibitory tagging and saccadic selection of gazepositions (see Supplementary Video)

To investigate model performance qualitatively weran simulations for one image (image 6 930 fixationFigure 5a) and obtained a number of fixations similarto the experimental data (882 fixations Figure 5b)Single-trial scanpaths from experiments and simula-tions are shown additionally in Figure 5a and 5brespectively) The resulting distributions of saccadelengths indicate that our dynamical model is in goodagreement with experimental data while the twosurrogate datasets (homogeneous and inhomogeneouspoint processes) fail to reproduce the distribution(Figure 5c) An analysis of the pair correlation

Figure 4 Illustration of a simulated sequence of gaze positions and the activation dynamics of the model (a) The density of gazepositions (empirical saliency) is used as a proxy for a computed saliency map that drives activation in the attention map (bndashf)Sequence of snapshots of the potential (bluefrac14 low yellowfrac14 high) Note that blue color indicates density of fixations in (a) while itrefers to low values of the potential in (b)ndash(f)

Parameter Symbol Mean Error Min Max Reference

Fixation mapActivation span [8] r0 216 011 03 100 Eq (7)Decay log10 x ndash403 028 ndash50 ndash10 Eq (6)

Attention mapActivation span [8] r1 488 025 03 100 Eq (8)Decay log10 q ndash118 008 ndash30 ndash10 Eq (8)

Target selectionAdditive noise log10 g ndash404 007 ndash90 ndash30 Eq (10)

Table 1 Model parameters

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 9

functions indicates that the spatial correlations presentin the experimental data were approximated by thedynamical model (Figure 5d) however the twosurrogate datasets representing uncorrelated sequencesby construction produce qualitatively different spatialcorrelations

To investigate the influences of saccade-lengthdistributions on pair correlations we constructedanother statistical control model that had access to theimage-specific saccade-length distribution It is impor-tant to note that this model was not introduced as acompetitor to the dynamical model which is able topredict saccade-length distributions The statisticalcontrol model approximated the distribution of saccadelengths l and 2D densities of gaze positions x bysampling from the joint probability distribution p(x l)under the assumption of statistical independence ofsaccade lengths and gaze positions ie p(x l)frac14p(x)p(l) This model by construction approximates thedistribution of saccade lengths and 2D density of gazepositions (Figure 5c) The simulations indicate how-

ever that even the combination of inhomogeneousdensity of gaze positions and non-normal distributionof saccade lengths used by the statistical control modelcannot explain spatial correlations in the experimentaldata characterized by the pair correlations function(Figure 5d)

For the statistical analysis of model performance onnew images we carried out additional numericalsimulations We fitted model parameters to dataobtained for the first five images only (see above) andpredicted data for the remaining 10 images by the newsimulations to isolate parameter estimation from modelevaluation (calculating test errors rather than trainingerrors)

Our simulations show that the dynamical modelpredicted the 2D density of gaze positions accurately(Figure 6a) The obtained KLD values for the model(blue) were comparable to KLD values calculated bythe split-half procedure for the experimental data (red)and to the KLD values obtained for the statisticalcontrol models (green frac14Homogeneous point process

Figure 5 Distribution of saccade lengths and pair correlation functions from model simulations (image 6) (a) Experimental

distribution of gaze positions (red dots) and a representative sample trial (red lines) (b) Corresponding plot of simulated data

obtained from our dynamical model (blue dots) A single-trial simulation is highlighted (blue line) (c) Distributions of saccade lengths

for experimental data (red) dynamical model (blue) homogeneous point process (green) inhomogeneous point process (yellow) and

a statistical control model (magenta) (d) Pair correlation functions for the different models The dynamical model (blue line) produces

spatial correlations similar to the experimental data (red line)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 10

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 3: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

Spatial statistics

Gaze positions can be interpreted as realizationsfrom a spatial point process (Illian et al 2008) thatcan be represented as the random set of points N frac14x1 x2 x3 (also called a point pattern) The 2Ddensity (or intensity) k of the spatial point process isgiven as the expectation or mean value of the numberof points in an observation window B ie kfrac14E(n(B))where n() is a counting measure A process isstatistically homogeneous (or stationary) if N and thetranslated set Nxfrac14 x1thorn x x2thorn x x3thorn x have thesame distribution for all x For a stationary spatialpoint process the intensity k is constant over spaceFor a nonstationary process the intensity is a functionof location kfrac14 k(x) For the computation of densitiesfrom experimental data we used kernel-densityestimates with bandwidth parameters chosen accord-ing to Scottrsquos rule (Baddeley amp Turner 2005 Scott1992) To compute deviations between 2D densitiesPkl and Qkl at grid position (k l) we used a symmetricversion of the Kullback-Leibler divergence derivedfrom information gain (Beck amp Schlogl 1993) ie

DKLD frac141

2

X

kl

PkllogPkl

QklthornQkllog

Qkl

Pkl

eth1THORN

Second-order statistics (see also the illustrated notesin the Appendix) are based on the pair density q(x1x2) which gives the probability q(x1 x2) dx1 dx2 ofobserving points in each of two disks b1 and b2 withlinear dimensions dx1 and dx2 respectively Pointpatterns can be characterized by the pair densitywhich is typically a function of the pair distance ie q(x1 x2) frac14 q(r) with r frac14 jjx1ndashx2jj for two arbitraryrealizations x1 and x2 Using a kernel-based method aestimator for the pair density can be written as

qethrTHORN frac14X6frac14

x1x2W

kethjjx1 x2jj rTHORN2prAjjx1x2jj

eth2THORN

where k() is an appropriate kernel and Ajjx1x2jjdenotes an edge correction at distance jjx1 ndash x2jj(Baddeley amp Turner 2005) For numerical computa-tions we used the Epanechnikov kernel (Illian et al2008) ie

kethxTHORN frac143

4h1 x2

h2

for h amp x amp h

0 otherwise

8gtlt

gteth3THORN

The problem of choosing the bandwidth h appro-priately is frequently discussed in the literature (Illian etal 2008) The bottom line from this discussion is thatthe behavior of the estimator should be analyzed over a

range of bandwidths We will run such an analysisbelow (Figure 2)

The pair correlation function g(r) is a normaliza-tion of the pair density with respect to first-orderintensity k so that the estimator for the paircorrelation is given by g(r) frac14 q(r) k2 Theinterpretation of the pair correlation function for agiven point pattern is straightforward For a randompattern without clustering the pair correlationfunction is g(r) rsquo 1 across the full range of distancesr If g(r) 1 then pairs of fixations are moreabundant than on average at distance r If g(r)1then pairs of fixations are less abundant than onaverage at a distance r Thus the pair correlationfunction g(r) measures how selection of a particularpoint location (ie fixation position) is influenced byother fixations at distance r

Using the inhomogeneous pair correlation functionginhom(r) we can remove the first-order inhomogeneityfrom the second-order spatial statistics ie

ginhomethrTHORN frac14X6frac14

x1x2W

1

kethx1THORNkethx2THORNkethjjx1 x2jj rTHORN

2prAjjx1x2jj

eth4THORNEstimation of ginhom(r) involves two steps First we

estimated the overall intensity k(x) for all fixationpositions obtained for a given scene In this procedurewe borrow strength from the full set of observations toobtain reliable estimates of the inhomogeneity Secondwe computed the pair correlation function from a singletrial with respect to the inhomogeneous density of thefull data set

In case of a given pair correlation function g(r) thescalar quantity (Illian et al 2008)

Dg frac14Z lsquo

0ethgethrTHORN 1THORN2dr eth5THORN

denoted as PCF deviation in the following serves as auseful test statistic that quantifies the deviations fromrandomness for a given point pattern with inhomoge-neous density k(x) The integral in Equation (5) wasevaluated numerically for pair distances r between 018and 58 (image set 1 and 2) and between 018 and 38 (LeMeur Le Callet Barba amp Thoreau [2006] data seebelow)

Results

We conducted an eye-tracking experiment on sceneviewing with 35 human observers using 15 object-basednatural scenes (image set 1) Resulting gaze data wereevaluated using first- and second-order spatial statis-tics we found that data exhibit unexpected spatial

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 3

aggregation (or clustering) We reproduced this findingfor a set of 15 abstract natural patterns (image set 2)and for an external dataset (Le Meur et al 2006) thatwas made publicly available (Bylinskii Judd DurandOliva amp Torralba 2012) Based on these results wedeveloped a dynamical model for saccadic selectionthat was evaluated by the spatial-statistics approachintroduced in this section

Spatial statistics and pair correlation function

We began by numerically computing the spatial (2D)density of gaze positions from experimental data

collected for image set 1 (Figure 1a) Fixation positionsare indicated by red dots (a total of 930 fixations from35 observers for image 2) Densities were computedusing a 2D kernel density estimator (Baddeley ampTurner 2005 R Core Team 2013 see Methods) andare visualized by gray shading in the plot Thebandwidth parameter hdensity for the kernel densityestimation was computed according to Scottrsquos rule(Scott 1992 range from 188 to 228 for h over allimages from image set 1) The obtained 2D densityk(xy) is inhomogeneous because of the dependence onposition (x y) A representative sample trajectory froma single trial is given in Figure 1d where the second andlast fixation of the scanpath is highlighted by white

Figure 1 Analysis of pair correlation functions for experimental gaze sequences (image set 1) and for computer-generated surrogate

data (a) Experimental data of gaze positions from human observers (red) and estimated intensity from kernel density estimate (gray

levels) for image 2 (b c) Realizations of gaze positions generated by inhomogeneous and homogeneous point processes

respectively (d e) Typical single-trial fixation sequences from experiment (red) and inhomogeneous point process (yellow) (f)

Kullback-Leibler divergence (KLD) indicates that the inhomogeneous point process approximates the experimental 2D density of gaze

positions (g) Pair correlation functions (PCFs) for experimental data (single trials light gray single trial from (d) black averaged over

trials red) (h) Mean PCFs for experimental data inhomogeneous and homogeneous Poisson process (i) The PCF deviation shows

that the experimental data are spatially correlated while the two surrogate datasets fail to reproduce this statistical pattern

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 4

color and by their serial numbers The first fixation wasomitted since all trials started at a random positionwithin an image determined by our experimentalprocedure (see Methods)

The pair correlation function g(r) gives a quantita-tive summary of interactions in fixation patterns bymeasuring how distance patterns between fixationsdiffer from what we would expect from independentlydistributed data (see Appendix) A value of g(r) above 1for a particular distance r indicates clustering meaningthat there are more pairs of points separated by adistance r than we would expect if fixation locationswere statistically independent

For the estimation of the pair correlation functionshort sequences of gaze positions from single experi-mental trials were considered However spatial inho-mogeneity of the 2D density was taken into accountTo obtain a reliable estimate of the spatial inhomo-geneity the 2D density was estimated from the fulldata-set (Figure 1a) of all fixations on a given imagetaken from all participants and trials It is important tonote however that in the computation of the kerneldensity estimate k(x) used for the inhomogeneous paircorrelation function Equation (4) an optimal band-width parameter h is needed to avoid two possibleartifacts First if h is very small then spatialcorrelations might be underestimated due to overfittingof the inhomogeneity of the density Second if h is toolarge then spatial correlations might be overestimatedsince first-order inhomogeneity is not adequatelyremoved from the second-order spatial statistics Wesolved this problem by computing the PCF deviationDg for the inhomogeneous point process for varyingvalues of the bandwidth h (Figure 2) Since theinhomogeneous point process generates uncorrelated

fixations ie gtheo(r) frac14 1 the optimal bandwidth forthe dataset corresponds to a minimum of the PCFdeviation Dg (quantifying the deviation from the idealvalue g(r)frac14 1) For image set 1 the optimal value wasestimated as h1 frac14 408 (Figure 2a)

Based on this density estimate we can compute theinhomogeneous pair correlation function ginhom(r) inwhich first-order inhomogeneity is removed from thesecond-order spatial correlations (see Methods) As aresult we obtained pair correlations from individualtrials (Figure 1g gray lines) Deviations from ginhom(r)rsquo 1 indicate spatial clustering at a specific distance rThe mean pair correlation function ginhom(r) providesevidence for clustering at small spatial scales with r 48 (Figure 1g red line) Such a scale is greater than thefoveal zone (r 28) and might provide an estimate ofthe size of the effective perceptual window in free sceneviewing This result is compatible with earlier findingsthat the zone of active selection of saccade targetsextends beyond the fovea into the parafovea up toeccentricities of 48 (Reinagel amp Zador 1999)

Next we carried out the same numerical computa-tions for two sets of surrogate data The surrogate datawere generated to test the null hypotheses of completespatial randomness both for an inhomogeneous pointprocess with position-dependent intensity k(x) and fora homogeneous point process with constant intensityk0 For the inhomogeneous point process we sampledfrom the estimated intensity k(x) (Figure 1b) whereasa constant intensity k0 obtained from spatial averag-ing was used for the homogeneous point process(Figure 1c) Both surrogate datasets are important forchecking the reliability of the computation of the paircorrelation function for the original data (Figure 1g)First the inhomogeneous point process gives a flat

Figure 2 Optimal bandwidth parameter for inhomogeneous pair correlation function (PCF) for three different image sets For

simulated data from the inhomogeneous point process the PCF deviation Dg Equation (5) was computed as a function of the

bandwidth h for the underlying kernel density estimate The optimal bandwidth corresponds to the position of the minimum of Dg (a)

For image set 1 h1frac14 408 (b) For image set 2 h2frac14 468 (c) For the Le Meur dataset h3frac14 228 due to a much smaller presentation

display compared with our experiments

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 5

mean correlation function with g(r) rsquo 1 (Figure 1h)which demonstrates the absence of clustering (exceptfor the divergence at very small scales as an effect ofnumerical computation issues) Thus the spatialcorrelations in the experimental data are not a simpleconsequence of spatial inhomogeneity Second theresult for the homogeneous point process (Figure 1h) isthe same as for the inhomogeneous point processwhich indicates that the correction for inhomogeneityneeded for computations in Figure 1g does notproduce unwanted artefacts due to possible overfittingof the spatial inhomogeneities We conclude that ourexperimental data give a clear indication for spatialclustering at length-scales smaller than 48 of visualangle Additionally we checked the hypothesis thatthis effect of spatial clustering might be due to saccadicundershoot and subsequent short correction saccadesby excluding all fixations with durations shorter than200 ms A related analysis of the PCF indicates noqualitative differences from the original data (Figure1h)

Results reported so far were obtained for a singlerepresentative image Over the full set of 15 images weanalyzed the 2D densities using a symmetrized form ofthe Kullback-Leibler divergence (KLD) based on theconcept of information gain (Beck amp Schlogl 1993)For the experimental data we applied a split-halfprocedure (first half of participants vs second half ofparticipants) and computed the KLD between the twoexperimental densities The corresponding KLD valuesdemonstrate that the inhomogeneous point processreproduces the 2D density (Figure 1f) while thehomogeneous point process clearly fails to approximatethe systematic inhomogeneity in the image Model typehad a significant effect on KLD v2(2)frac141054 p 001Contrasts revealed that (1) spatially inhomogeneousdata-sets were different from the homogeneous data bfrac14 0319 t(28) frac14 2285 p 001 and that (2)experimental data were significantly different frominhomogeneous surrogate data bfrac14 0062 t(28)frac14 256pfrac14 0016

Both sets of surrogate data were submitted to ananalysis of the pair correlation function where thedeviations from g(r) rsquo 1 were computed to obtain aPCF measure indicating the amount of spatial corre-lation averaged over distances (see Methods) Resultsindicate that the surrogate data producemdashas de-signedmdashuncorrelated gaze positions (low PCF devia-tion) while the experimental data by human observersexhibit spatially correlated gaze positions (Figure 1i)Model type had a significant effect on PCF v2(2)frac14984 p 001

To investigate the reliability of our finding of spatialclustering on short-length scales during natural sceneviewing we investigated two other datasets using thesame procedures with an optimal bandwidth adjusted

for each dataset (Figure 2andashc) First we compared themean pair correlation functions per image for naturalobject-based scenes (image set 1 Figure 3a d) andabstract natural patterns (image set 2 Figure 3b e)While variability of the pair correlation function mightbe greater for the abstract scenes than for the object-based scenes (Figure 3d e gray lines) the mean paircorrelation functions for the images sets (Figure 3d ered lines) are very similar This result indicates that theclustering on short length scales is a robust phenom-enon which does not seem to depend sensitively onscene content Second we analyzed a publicly availabledataset (Le Meur et al 2006) from the MIT saliencybenchmark (Bylinskii et al 2012 Judd Durand ampTorralba 2012) consisting of 40 participants whoviewed 27 color images for 15 s For these data thespatial scale of the presentation display for the images(Figure 3c) was considerably smaller than in ourexperiment Consequently we observed pair correla-tion functions that indicate clustering on smaller scales( 158 Figure 3f) Thus while scene content does notseem to exert a strong influence on spatial correlationsspatial scale of the image modulates the spatial scale ofthe clustering of fixations

We conclude that second-order spatial statisticsobtained for the experimental data are significantlydifferent from stochastic processes implementing theassumption of spatial randomness Furthermore themere presence of spatial inhomogeneity in the experi-mental data cannot explain by itself the observedspatial correlations which is evident in the results forthe inhomogeneous point process While inhibition-of-return (Klein 2000) has been discussed frequently asone of the key principles added to saliency maps forsaccadic selection (Itti amp Koch 2001) spatial clusteringof gaze positions is an additional statistical propertythat is highly informative on mechanisms of gazeplanning but has been neglected so far Next we usethese results to develop and test a dynamical model forsaccade generation that uses activation field dynamicsto reproduce spatial statistics of first- and second-order

A dynamical model of saccade generation

A key assumption for the model we propose is thecombination of two neural activation maps toimplement dynamical principles for saccadic selectionFirst a fixation map f(x y t) is keeping track of thesequence of fixations by inhibitory tagging (Itti ampKoch 2001) Second an attention map a(x y t) thatis driven by early visual processing controls thedistribution of attention Physiologically the as-sumption of the dynamical maps is supported by thepresence of an allocentric motor map of visual spacein the primate entorhinal cortex (Killian Jutras amp

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 6

Buffalo 2012) Moreover this map is spatiallydiscrete (Stensola et al 2012) and serves as abiological motivation for the fixation and attentionmaps in our model

We implemented activation maps for attention andfixation (inhibitory tagging) on a discrete square latticeof dimension L middot L Lattice points (i j) haveequidistant spatial positions (xi yj) for i j frac14 1 Lwhere xifrac14x0thorn iDx and yjfrac14y0thorn jDy As a consequenceattention and fixation maps are implemented inspatially discrete forms aij (t) and fij(t) respec-tively For the numerical simulations time wasdiscretized in steps of Dtfrac14 10 ms with tfrac14 k middot Dt and kfrac14 0 1 2 T

If the observerrsquos gaze is at position (xg yg) at time tthen a position-dependent activation change Fij(xg yg)and a global decay proportional to the currentactivation ndashxfij(t) are added to all lattice positions toupdate the activation map at time t thorn 1 ie

fijethtthorn 1THORN frac14 Fijethxg ygTHORN thorn eth1 xTHORNfijethtTHORN eth6THORN

where the activation change Fij(xg yg) rdquo Fij(t) isimplicitly time-dependent because of the time-depen-dence of gaze positions xg(t) yg(t) The constant x 1determines the strength of the decay of activation Forthe spatial distribution of the activation change Fij(t)

we assume a Gaussian profile ie

FijethtTHORN frac14R0ffiffiffiffiffiffi2pp

r0

exp ethxi xgethtTHORNTHORN2 thorn ethyj ygethtTHORNTHORN2

2r20

eth7THORNwith the free parameters r0 and R0 controlling thespatial extent of the activation change and the strengthof the activation change respectively In our model thebuild-up of activation in the fixation map is amechanism of inhibitory tagging (Itti amp Koch 2001) toreduce the amount of refixations on recently visitedimage patches

For the attention map aij(t) we assume similardynamics however the width of Gaussian activationchange Aij(t) is assumed to be proportional to the staticsaliency map ij The updating rule for the attentionmap is given by

aijethtthorn 1THORN frac14ijAijethtTHORNX

kl

klAklethtTHORNthorn eth1 qTHORNaijethtTHORN eth8THORN

with decay constant q 1 As a result the saliencymap ij is accessed locally through a Gaussian aperturewith size r1 and scale parameter R1 similar to Equation

Figure 3 Comparison of the mean pair correlation functions for three different image sets (a) Natural object-based scenes (b)

Abstract natural patterns (c) Natural scenes from the Le Meur et al dataset Mean pair correlation function are similar for object-

based (d) and abstract (e) natural scenes The presentation of images on a smaller display in the Le Meur dataset compared to our

experiments (f) resulted in a smaller length scale of spatial clustering

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 7

(7) Using the local read-out mechanism information isprovided for the attention map to identify regions ofinterest for eye guidance

The fixation map monitors recently visited fixationlocations by increasing local activation at the corre-sponding lattice points (Engbert et al 2011 Freund ampGrassberger 1992) If the observerrsquos gaze position is ata position corresponding to lattice position (i j) then aposition-dependent activation change Fij in the form ofa Gaussian profile is added locally in each time stepwhile a global decay proportional to the currentactivation is applied to all lattice positions The widthof the Gaussian activation r0 and the decay x are thetwo free parameters controlling activation in thefixation map For the attention map aij(t) we assumesimilar dynamics including local increase of activationwith size r1 and global decay q However the amountof activation change Aij is assumed to be proportionalto the time-independent saliency map ij so that thelocal increase of activation isthorn ijAij

PklAkl

Our modeling assumptions are related to specifichypotheses on model parameters We expect that thesize of the Gaussian profile for the attention map islarger than the corresponding size of the fixation mapr1 r0 since attention is the process driving eyemovements into new regions of visual space while theinhibitory tagging process should be more localized Asimilar expectation can be formulated on the decayconstants Since inhibitory tagging is needed on alonger time scale as a foraging facilitator we expect aslower decay in the fixation map compared to theattention map ie x q

Next we assume that given a saccade command attime t both maps are evaluated to select the nextsaccade target First we apply a normalization of bothattention and fixation maps as a general neuralprinciple to obtain relative activations (Carandini ampHeeger 2011) Second we introduce a potentialfunction as the difference of the normalized maps

uijethtTHORN frac14 aijethtTHORN$ kX

kl

aklethtTHORNfrac12 )kthorn

fijethtTHORN$ cX

kl

fklethtTHORNfrac12 )c eth9THORN

where the exponents k and c are free parametersHowever a value of kfrac14 1 is a necessary boundarycondition to obtain a model that accurately reproducesthe densities of gaze positions In a qualitative analysisof the model (see Appendix) pilot simulations showedthat c is an important control parameter determiningspatial correlations where c rsquo 03 was used toreproduce spatial correlations observed in our experi-mental data

The potential uij(t) Equation (9) can be positive ornegative at position (i j) Lattice positions with apositive potential uij 0 are excluded from saccadic

selection since corresponding regions were visitedrecently with high probability Among the latticepositions with negative activations we implementedstochastic selection of saccade targets proportional torelative activations also known as Lucersquos choice rule(Luce 1959) We implemented this form of stochasticselection from the set S frac14 fethi jTHORNjuij 0g where theprobability pij(t) to select lattice position (i j) at time tas the next saccade target is given by

pijethtTHORN frac14 maxuijethtTHORNX

ethklTHORNSuklethtTHORN

g

0

BB

1

CCA eth10THORN

The noise term g is an additional parametercontrolling the amount of noise in target selection

Numerical simulations of the model

Our computational modeling approach to saccadicselection has been developed to propose a minimalmodel that captures the types of spatial statisticsobserved in experimental data After fixing modelparameters all that is needed to run the model on aparticular image is a 2D density estimate of gazepatterns (from experimental data) or a corresponding2D prediction from one of the available saliency models(Borji amp Itti 2013) To reduce computational com-plexity in the current study and to exclude potentialmismatches between data and model simulations due tothe saliency models we run the model on experimen-tally realized densities of gaze positions which isequivalent to assuming an exact saliency modelHowever our modeling approach is compatible withfuture dynamical saliency models that provide time-and position-dependent saliency during a sequence ofgaze shifts thus our model introduces a generaldynamical framework and is not tied to using empiricaldata

The numerical values of the five model parameterswere estimated from experimental data recorded for thefirst five images of natural object-based scenes (imageset 1) using a genetic algorithm approach (Mitchell1998 see Appendix Table 1) The remaining 10 imagesof image set 1 were used for model evaluationsmdashas the15 images of image set 2 The objective function forparameter estimation was based on evaluation of first-order statistics (2D density of gaze positions) and thedistribution of saccade lengths In agreement with ourfirst expectation the estimated optimal values for thespatial extent of the inhibitory tagging process in thefixation map r0frac14 228 is considerably smaller than thecorresponding size of the build-up function for theattention map r1frac14 498 Our second expectation wasrelated to the decay constants which turned out to be

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 8

larger for the attention map qfrac14 0066 than for thefixation map xfrac14 93 middot 105 so q was greater than xagain as expected Finally the noise level in the targetmap is g frac14 91 middot 105

An example for the simulation of the modeldemonstrates the interplay between inhibitory pro-cesses from the fixation map and the attention mapduring gaze planning (Figure 4) The fixation mapbuilds up activation at fixated lattice positions (yellowto red) while the attention map identifies new regionsof interest for saccadic selection (blue) These simula-tions show on a qualitative level how the modelimplements the interplay of the assumed mechanisms of

inhibitory tagging and saccadic selection of gazepositions (see Supplementary Video)

To investigate model performance qualitatively weran simulations for one image (image 6 930 fixationFigure 5a) and obtained a number of fixations similarto the experimental data (882 fixations Figure 5b)Single-trial scanpaths from experiments and simula-tions are shown additionally in Figure 5a and 5brespectively) The resulting distributions of saccadelengths indicate that our dynamical model is in goodagreement with experimental data while the twosurrogate datasets (homogeneous and inhomogeneouspoint processes) fail to reproduce the distribution(Figure 5c) An analysis of the pair correlation

Figure 4 Illustration of a simulated sequence of gaze positions and the activation dynamics of the model (a) The density of gazepositions (empirical saliency) is used as a proxy for a computed saliency map that drives activation in the attention map (bndashf)Sequence of snapshots of the potential (bluefrac14 low yellowfrac14 high) Note that blue color indicates density of fixations in (a) while itrefers to low values of the potential in (b)ndash(f)

Parameter Symbol Mean Error Min Max Reference

Fixation mapActivation span [8] r0 216 011 03 100 Eq (7)Decay log10 x ndash403 028 ndash50 ndash10 Eq (6)

Attention mapActivation span [8] r1 488 025 03 100 Eq (8)Decay log10 q ndash118 008 ndash30 ndash10 Eq (8)

Target selectionAdditive noise log10 g ndash404 007 ndash90 ndash30 Eq (10)

Table 1 Model parameters

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 9

functions indicates that the spatial correlations presentin the experimental data were approximated by thedynamical model (Figure 5d) however the twosurrogate datasets representing uncorrelated sequencesby construction produce qualitatively different spatialcorrelations

To investigate the influences of saccade-lengthdistributions on pair correlations we constructedanother statistical control model that had access to theimage-specific saccade-length distribution It is impor-tant to note that this model was not introduced as acompetitor to the dynamical model which is able topredict saccade-length distributions The statisticalcontrol model approximated the distribution of saccadelengths l and 2D densities of gaze positions x bysampling from the joint probability distribution p(x l)under the assumption of statistical independence ofsaccade lengths and gaze positions ie p(x l)frac14p(x)p(l) This model by construction approximates thedistribution of saccade lengths and 2D density of gazepositions (Figure 5c) The simulations indicate how-

ever that even the combination of inhomogeneousdensity of gaze positions and non-normal distributionof saccade lengths used by the statistical control modelcannot explain spatial correlations in the experimentaldata characterized by the pair correlations function(Figure 5d)

For the statistical analysis of model performance onnew images we carried out additional numericalsimulations We fitted model parameters to dataobtained for the first five images only (see above) andpredicted data for the remaining 10 images by the newsimulations to isolate parameter estimation from modelevaluation (calculating test errors rather than trainingerrors)

Our simulations show that the dynamical modelpredicted the 2D density of gaze positions accurately(Figure 6a) The obtained KLD values for the model(blue) were comparable to KLD values calculated bythe split-half procedure for the experimental data (red)and to the KLD values obtained for the statisticalcontrol models (green frac14Homogeneous point process

Figure 5 Distribution of saccade lengths and pair correlation functions from model simulations (image 6) (a) Experimental

distribution of gaze positions (red dots) and a representative sample trial (red lines) (b) Corresponding plot of simulated data

obtained from our dynamical model (blue dots) A single-trial simulation is highlighted (blue line) (c) Distributions of saccade lengths

for experimental data (red) dynamical model (blue) homogeneous point process (green) inhomogeneous point process (yellow) and

a statistical control model (magenta) (d) Pair correlation functions for the different models The dynamical model (blue line) produces

spatial correlations similar to the experimental data (red line)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 10

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 4: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

aggregation (or clustering) We reproduced this findingfor a set of 15 abstract natural patterns (image set 2)and for an external dataset (Le Meur et al 2006) thatwas made publicly available (Bylinskii Judd DurandOliva amp Torralba 2012) Based on these results wedeveloped a dynamical model for saccadic selectionthat was evaluated by the spatial-statistics approachintroduced in this section

Spatial statistics and pair correlation function

We began by numerically computing the spatial (2D)density of gaze positions from experimental data

collected for image set 1 (Figure 1a) Fixation positionsare indicated by red dots (a total of 930 fixations from35 observers for image 2) Densities were computedusing a 2D kernel density estimator (Baddeley ampTurner 2005 R Core Team 2013 see Methods) andare visualized by gray shading in the plot Thebandwidth parameter hdensity for the kernel densityestimation was computed according to Scottrsquos rule(Scott 1992 range from 188 to 228 for h over allimages from image set 1) The obtained 2D densityk(xy) is inhomogeneous because of the dependence onposition (x y) A representative sample trajectory froma single trial is given in Figure 1d where the second andlast fixation of the scanpath is highlighted by white

Figure 1 Analysis of pair correlation functions for experimental gaze sequences (image set 1) and for computer-generated surrogate

data (a) Experimental data of gaze positions from human observers (red) and estimated intensity from kernel density estimate (gray

levels) for image 2 (b c) Realizations of gaze positions generated by inhomogeneous and homogeneous point processes

respectively (d e) Typical single-trial fixation sequences from experiment (red) and inhomogeneous point process (yellow) (f)

Kullback-Leibler divergence (KLD) indicates that the inhomogeneous point process approximates the experimental 2D density of gaze

positions (g) Pair correlation functions (PCFs) for experimental data (single trials light gray single trial from (d) black averaged over

trials red) (h) Mean PCFs for experimental data inhomogeneous and homogeneous Poisson process (i) The PCF deviation shows

that the experimental data are spatially correlated while the two surrogate datasets fail to reproduce this statistical pattern

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 4

color and by their serial numbers The first fixation wasomitted since all trials started at a random positionwithin an image determined by our experimentalprocedure (see Methods)

The pair correlation function g(r) gives a quantita-tive summary of interactions in fixation patterns bymeasuring how distance patterns between fixationsdiffer from what we would expect from independentlydistributed data (see Appendix) A value of g(r) above 1for a particular distance r indicates clustering meaningthat there are more pairs of points separated by adistance r than we would expect if fixation locationswere statistically independent

For the estimation of the pair correlation functionshort sequences of gaze positions from single experi-mental trials were considered However spatial inho-mogeneity of the 2D density was taken into accountTo obtain a reliable estimate of the spatial inhomo-geneity the 2D density was estimated from the fulldata-set (Figure 1a) of all fixations on a given imagetaken from all participants and trials It is important tonote however that in the computation of the kerneldensity estimate k(x) used for the inhomogeneous paircorrelation function Equation (4) an optimal band-width parameter h is needed to avoid two possibleartifacts First if h is very small then spatialcorrelations might be underestimated due to overfittingof the inhomogeneity of the density Second if h is toolarge then spatial correlations might be overestimatedsince first-order inhomogeneity is not adequatelyremoved from the second-order spatial statistics Wesolved this problem by computing the PCF deviationDg for the inhomogeneous point process for varyingvalues of the bandwidth h (Figure 2) Since theinhomogeneous point process generates uncorrelated

fixations ie gtheo(r) frac14 1 the optimal bandwidth forthe dataset corresponds to a minimum of the PCFdeviation Dg (quantifying the deviation from the idealvalue g(r)frac14 1) For image set 1 the optimal value wasestimated as h1 frac14 408 (Figure 2a)

Based on this density estimate we can compute theinhomogeneous pair correlation function ginhom(r) inwhich first-order inhomogeneity is removed from thesecond-order spatial correlations (see Methods) As aresult we obtained pair correlations from individualtrials (Figure 1g gray lines) Deviations from ginhom(r)rsquo 1 indicate spatial clustering at a specific distance rThe mean pair correlation function ginhom(r) providesevidence for clustering at small spatial scales with r 48 (Figure 1g red line) Such a scale is greater than thefoveal zone (r 28) and might provide an estimate ofthe size of the effective perceptual window in free sceneviewing This result is compatible with earlier findingsthat the zone of active selection of saccade targetsextends beyond the fovea into the parafovea up toeccentricities of 48 (Reinagel amp Zador 1999)

Next we carried out the same numerical computa-tions for two sets of surrogate data The surrogate datawere generated to test the null hypotheses of completespatial randomness both for an inhomogeneous pointprocess with position-dependent intensity k(x) and fora homogeneous point process with constant intensityk0 For the inhomogeneous point process we sampledfrom the estimated intensity k(x) (Figure 1b) whereasa constant intensity k0 obtained from spatial averag-ing was used for the homogeneous point process(Figure 1c) Both surrogate datasets are important forchecking the reliability of the computation of the paircorrelation function for the original data (Figure 1g)First the inhomogeneous point process gives a flat

Figure 2 Optimal bandwidth parameter for inhomogeneous pair correlation function (PCF) for three different image sets For

simulated data from the inhomogeneous point process the PCF deviation Dg Equation (5) was computed as a function of the

bandwidth h for the underlying kernel density estimate The optimal bandwidth corresponds to the position of the minimum of Dg (a)

For image set 1 h1frac14 408 (b) For image set 2 h2frac14 468 (c) For the Le Meur dataset h3frac14 228 due to a much smaller presentation

display compared with our experiments

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 5

mean correlation function with g(r) rsquo 1 (Figure 1h)which demonstrates the absence of clustering (exceptfor the divergence at very small scales as an effect ofnumerical computation issues) Thus the spatialcorrelations in the experimental data are not a simpleconsequence of spatial inhomogeneity Second theresult for the homogeneous point process (Figure 1h) isthe same as for the inhomogeneous point processwhich indicates that the correction for inhomogeneityneeded for computations in Figure 1g does notproduce unwanted artefacts due to possible overfittingof the spatial inhomogeneities We conclude that ourexperimental data give a clear indication for spatialclustering at length-scales smaller than 48 of visualangle Additionally we checked the hypothesis thatthis effect of spatial clustering might be due to saccadicundershoot and subsequent short correction saccadesby excluding all fixations with durations shorter than200 ms A related analysis of the PCF indicates noqualitative differences from the original data (Figure1h)

Results reported so far were obtained for a singlerepresentative image Over the full set of 15 images weanalyzed the 2D densities using a symmetrized form ofthe Kullback-Leibler divergence (KLD) based on theconcept of information gain (Beck amp Schlogl 1993)For the experimental data we applied a split-halfprocedure (first half of participants vs second half ofparticipants) and computed the KLD between the twoexperimental densities The corresponding KLD valuesdemonstrate that the inhomogeneous point processreproduces the 2D density (Figure 1f) while thehomogeneous point process clearly fails to approximatethe systematic inhomogeneity in the image Model typehad a significant effect on KLD v2(2)frac141054 p 001Contrasts revealed that (1) spatially inhomogeneousdata-sets were different from the homogeneous data bfrac14 0319 t(28) frac14 2285 p 001 and that (2)experimental data were significantly different frominhomogeneous surrogate data bfrac14 0062 t(28)frac14 256pfrac14 0016

Both sets of surrogate data were submitted to ananalysis of the pair correlation function where thedeviations from g(r) rsquo 1 were computed to obtain aPCF measure indicating the amount of spatial corre-lation averaged over distances (see Methods) Resultsindicate that the surrogate data producemdashas de-signedmdashuncorrelated gaze positions (low PCF devia-tion) while the experimental data by human observersexhibit spatially correlated gaze positions (Figure 1i)Model type had a significant effect on PCF v2(2)frac14984 p 001

To investigate the reliability of our finding of spatialclustering on short-length scales during natural sceneviewing we investigated two other datasets using thesame procedures with an optimal bandwidth adjusted

for each dataset (Figure 2andashc) First we compared themean pair correlation functions per image for naturalobject-based scenes (image set 1 Figure 3a d) andabstract natural patterns (image set 2 Figure 3b e)While variability of the pair correlation function mightbe greater for the abstract scenes than for the object-based scenes (Figure 3d e gray lines) the mean paircorrelation functions for the images sets (Figure 3d ered lines) are very similar This result indicates that theclustering on short length scales is a robust phenom-enon which does not seem to depend sensitively onscene content Second we analyzed a publicly availabledataset (Le Meur et al 2006) from the MIT saliencybenchmark (Bylinskii et al 2012 Judd Durand ampTorralba 2012) consisting of 40 participants whoviewed 27 color images for 15 s For these data thespatial scale of the presentation display for the images(Figure 3c) was considerably smaller than in ourexperiment Consequently we observed pair correla-tion functions that indicate clustering on smaller scales( 158 Figure 3f) Thus while scene content does notseem to exert a strong influence on spatial correlationsspatial scale of the image modulates the spatial scale ofthe clustering of fixations

We conclude that second-order spatial statisticsobtained for the experimental data are significantlydifferent from stochastic processes implementing theassumption of spatial randomness Furthermore themere presence of spatial inhomogeneity in the experi-mental data cannot explain by itself the observedspatial correlations which is evident in the results forthe inhomogeneous point process While inhibition-of-return (Klein 2000) has been discussed frequently asone of the key principles added to saliency maps forsaccadic selection (Itti amp Koch 2001) spatial clusteringof gaze positions is an additional statistical propertythat is highly informative on mechanisms of gazeplanning but has been neglected so far Next we usethese results to develop and test a dynamical model forsaccade generation that uses activation field dynamicsto reproduce spatial statistics of first- and second-order

A dynamical model of saccade generation

A key assumption for the model we propose is thecombination of two neural activation maps toimplement dynamical principles for saccadic selectionFirst a fixation map f(x y t) is keeping track of thesequence of fixations by inhibitory tagging (Itti ampKoch 2001) Second an attention map a(x y t) thatis driven by early visual processing controls thedistribution of attention Physiologically the as-sumption of the dynamical maps is supported by thepresence of an allocentric motor map of visual spacein the primate entorhinal cortex (Killian Jutras amp

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 6

Buffalo 2012) Moreover this map is spatiallydiscrete (Stensola et al 2012) and serves as abiological motivation for the fixation and attentionmaps in our model

We implemented activation maps for attention andfixation (inhibitory tagging) on a discrete square latticeof dimension L middot L Lattice points (i j) haveequidistant spatial positions (xi yj) for i j frac14 1 Lwhere xifrac14x0thorn iDx and yjfrac14y0thorn jDy As a consequenceattention and fixation maps are implemented inspatially discrete forms aij (t) and fij(t) respec-tively For the numerical simulations time wasdiscretized in steps of Dtfrac14 10 ms with tfrac14 k middot Dt and kfrac14 0 1 2 T

If the observerrsquos gaze is at position (xg yg) at time tthen a position-dependent activation change Fij(xg yg)and a global decay proportional to the currentactivation ndashxfij(t) are added to all lattice positions toupdate the activation map at time t thorn 1 ie

fijethtthorn 1THORN frac14 Fijethxg ygTHORN thorn eth1 xTHORNfijethtTHORN eth6THORN

where the activation change Fij(xg yg) rdquo Fij(t) isimplicitly time-dependent because of the time-depen-dence of gaze positions xg(t) yg(t) The constant x 1determines the strength of the decay of activation Forthe spatial distribution of the activation change Fij(t)

we assume a Gaussian profile ie

FijethtTHORN frac14R0ffiffiffiffiffiffi2pp

r0

exp ethxi xgethtTHORNTHORN2 thorn ethyj ygethtTHORNTHORN2

2r20

eth7THORNwith the free parameters r0 and R0 controlling thespatial extent of the activation change and the strengthof the activation change respectively In our model thebuild-up of activation in the fixation map is amechanism of inhibitory tagging (Itti amp Koch 2001) toreduce the amount of refixations on recently visitedimage patches

For the attention map aij(t) we assume similardynamics however the width of Gaussian activationchange Aij(t) is assumed to be proportional to the staticsaliency map ij The updating rule for the attentionmap is given by

aijethtthorn 1THORN frac14ijAijethtTHORNX

kl

klAklethtTHORNthorn eth1 qTHORNaijethtTHORN eth8THORN

with decay constant q 1 As a result the saliencymap ij is accessed locally through a Gaussian aperturewith size r1 and scale parameter R1 similar to Equation

Figure 3 Comparison of the mean pair correlation functions for three different image sets (a) Natural object-based scenes (b)

Abstract natural patterns (c) Natural scenes from the Le Meur et al dataset Mean pair correlation function are similar for object-

based (d) and abstract (e) natural scenes The presentation of images on a smaller display in the Le Meur dataset compared to our

experiments (f) resulted in a smaller length scale of spatial clustering

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 7

(7) Using the local read-out mechanism information isprovided for the attention map to identify regions ofinterest for eye guidance

The fixation map monitors recently visited fixationlocations by increasing local activation at the corre-sponding lattice points (Engbert et al 2011 Freund ampGrassberger 1992) If the observerrsquos gaze position is ata position corresponding to lattice position (i j) then aposition-dependent activation change Fij in the form ofa Gaussian profile is added locally in each time stepwhile a global decay proportional to the currentactivation is applied to all lattice positions The widthof the Gaussian activation r0 and the decay x are thetwo free parameters controlling activation in thefixation map For the attention map aij(t) we assumesimilar dynamics including local increase of activationwith size r1 and global decay q However the amountof activation change Aij is assumed to be proportionalto the time-independent saliency map ij so that thelocal increase of activation isthorn ijAij

PklAkl

Our modeling assumptions are related to specifichypotheses on model parameters We expect that thesize of the Gaussian profile for the attention map islarger than the corresponding size of the fixation mapr1 r0 since attention is the process driving eyemovements into new regions of visual space while theinhibitory tagging process should be more localized Asimilar expectation can be formulated on the decayconstants Since inhibitory tagging is needed on alonger time scale as a foraging facilitator we expect aslower decay in the fixation map compared to theattention map ie x q

Next we assume that given a saccade command attime t both maps are evaluated to select the nextsaccade target First we apply a normalization of bothattention and fixation maps as a general neuralprinciple to obtain relative activations (Carandini ampHeeger 2011) Second we introduce a potentialfunction as the difference of the normalized maps

uijethtTHORN frac14 aijethtTHORN$ kX

kl

aklethtTHORNfrac12 )kthorn

fijethtTHORN$ cX

kl

fklethtTHORNfrac12 )c eth9THORN

where the exponents k and c are free parametersHowever a value of kfrac14 1 is a necessary boundarycondition to obtain a model that accurately reproducesthe densities of gaze positions In a qualitative analysisof the model (see Appendix) pilot simulations showedthat c is an important control parameter determiningspatial correlations where c rsquo 03 was used toreproduce spatial correlations observed in our experi-mental data

The potential uij(t) Equation (9) can be positive ornegative at position (i j) Lattice positions with apositive potential uij 0 are excluded from saccadic

selection since corresponding regions were visitedrecently with high probability Among the latticepositions with negative activations we implementedstochastic selection of saccade targets proportional torelative activations also known as Lucersquos choice rule(Luce 1959) We implemented this form of stochasticselection from the set S frac14 fethi jTHORNjuij 0g where theprobability pij(t) to select lattice position (i j) at time tas the next saccade target is given by

pijethtTHORN frac14 maxuijethtTHORNX

ethklTHORNSuklethtTHORN

g

0

BB

1

CCA eth10THORN

The noise term g is an additional parametercontrolling the amount of noise in target selection

Numerical simulations of the model

Our computational modeling approach to saccadicselection has been developed to propose a minimalmodel that captures the types of spatial statisticsobserved in experimental data After fixing modelparameters all that is needed to run the model on aparticular image is a 2D density estimate of gazepatterns (from experimental data) or a corresponding2D prediction from one of the available saliency models(Borji amp Itti 2013) To reduce computational com-plexity in the current study and to exclude potentialmismatches between data and model simulations due tothe saliency models we run the model on experimen-tally realized densities of gaze positions which isequivalent to assuming an exact saliency modelHowever our modeling approach is compatible withfuture dynamical saliency models that provide time-and position-dependent saliency during a sequence ofgaze shifts thus our model introduces a generaldynamical framework and is not tied to using empiricaldata

The numerical values of the five model parameterswere estimated from experimental data recorded for thefirst five images of natural object-based scenes (imageset 1) using a genetic algorithm approach (Mitchell1998 see Appendix Table 1) The remaining 10 imagesof image set 1 were used for model evaluationsmdashas the15 images of image set 2 The objective function forparameter estimation was based on evaluation of first-order statistics (2D density of gaze positions) and thedistribution of saccade lengths In agreement with ourfirst expectation the estimated optimal values for thespatial extent of the inhibitory tagging process in thefixation map r0frac14 228 is considerably smaller than thecorresponding size of the build-up function for theattention map r1frac14 498 Our second expectation wasrelated to the decay constants which turned out to be

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 8

larger for the attention map qfrac14 0066 than for thefixation map xfrac14 93 middot 105 so q was greater than xagain as expected Finally the noise level in the targetmap is g frac14 91 middot 105

An example for the simulation of the modeldemonstrates the interplay between inhibitory pro-cesses from the fixation map and the attention mapduring gaze planning (Figure 4) The fixation mapbuilds up activation at fixated lattice positions (yellowto red) while the attention map identifies new regionsof interest for saccadic selection (blue) These simula-tions show on a qualitative level how the modelimplements the interplay of the assumed mechanisms of

inhibitory tagging and saccadic selection of gazepositions (see Supplementary Video)

To investigate model performance qualitatively weran simulations for one image (image 6 930 fixationFigure 5a) and obtained a number of fixations similarto the experimental data (882 fixations Figure 5b)Single-trial scanpaths from experiments and simula-tions are shown additionally in Figure 5a and 5brespectively) The resulting distributions of saccadelengths indicate that our dynamical model is in goodagreement with experimental data while the twosurrogate datasets (homogeneous and inhomogeneouspoint processes) fail to reproduce the distribution(Figure 5c) An analysis of the pair correlation

Figure 4 Illustration of a simulated sequence of gaze positions and the activation dynamics of the model (a) The density of gazepositions (empirical saliency) is used as a proxy for a computed saliency map that drives activation in the attention map (bndashf)Sequence of snapshots of the potential (bluefrac14 low yellowfrac14 high) Note that blue color indicates density of fixations in (a) while itrefers to low values of the potential in (b)ndash(f)

Parameter Symbol Mean Error Min Max Reference

Fixation mapActivation span [8] r0 216 011 03 100 Eq (7)Decay log10 x ndash403 028 ndash50 ndash10 Eq (6)

Attention mapActivation span [8] r1 488 025 03 100 Eq (8)Decay log10 q ndash118 008 ndash30 ndash10 Eq (8)

Target selectionAdditive noise log10 g ndash404 007 ndash90 ndash30 Eq (10)

Table 1 Model parameters

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 9

functions indicates that the spatial correlations presentin the experimental data were approximated by thedynamical model (Figure 5d) however the twosurrogate datasets representing uncorrelated sequencesby construction produce qualitatively different spatialcorrelations

To investigate the influences of saccade-lengthdistributions on pair correlations we constructedanother statistical control model that had access to theimage-specific saccade-length distribution It is impor-tant to note that this model was not introduced as acompetitor to the dynamical model which is able topredict saccade-length distributions The statisticalcontrol model approximated the distribution of saccadelengths l and 2D densities of gaze positions x bysampling from the joint probability distribution p(x l)under the assumption of statistical independence ofsaccade lengths and gaze positions ie p(x l)frac14p(x)p(l) This model by construction approximates thedistribution of saccade lengths and 2D density of gazepositions (Figure 5c) The simulations indicate how-

ever that even the combination of inhomogeneousdensity of gaze positions and non-normal distributionof saccade lengths used by the statistical control modelcannot explain spatial correlations in the experimentaldata characterized by the pair correlations function(Figure 5d)

For the statistical analysis of model performance onnew images we carried out additional numericalsimulations We fitted model parameters to dataobtained for the first five images only (see above) andpredicted data for the remaining 10 images by the newsimulations to isolate parameter estimation from modelevaluation (calculating test errors rather than trainingerrors)

Our simulations show that the dynamical modelpredicted the 2D density of gaze positions accurately(Figure 6a) The obtained KLD values for the model(blue) were comparable to KLD values calculated bythe split-half procedure for the experimental data (red)and to the KLD values obtained for the statisticalcontrol models (green frac14Homogeneous point process

Figure 5 Distribution of saccade lengths and pair correlation functions from model simulations (image 6) (a) Experimental

distribution of gaze positions (red dots) and a representative sample trial (red lines) (b) Corresponding plot of simulated data

obtained from our dynamical model (blue dots) A single-trial simulation is highlighted (blue line) (c) Distributions of saccade lengths

for experimental data (red) dynamical model (blue) homogeneous point process (green) inhomogeneous point process (yellow) and

a statistical control model (magenta) (d) Pair correlation functions for the different models The dynamical model (blue line) produces

spatial correlations similar to the experimental data (red line)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 10

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 5: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

color and by their serial numbers The first fixation wasomitted since all trials started at a random positionwithin an image determined by our experimentalprocedure (see Methods)

The pair correlation function g(r) gives a quantita-tive summary of interactions in fixation patterns bymeasuring how distance patterns between fixationsdiffer from what we would expect from independentlydistributed data (see Appendix) A value of g(r) above 1for a particular distance r indicates clustering meaningthat there are more pairs of points separated by adistance r than we would expect if fixation locationswere statistically independent

For the estimation of the pair correlation functionshort sequences of gaze positions from single experi-mental trials were considered However spatial inho-mogeneity of the 2D density was taken into accountTo obtain a reliable estimate of the spatial inhomo-geneity the 2D density was estimated from the fulldata-set (Figure 1a) of all fixations on a given imagetaken from all participants and trials It is important tonote however that in the computation of the kerneldensity estimate k(x) used for the inhomogeneous paircorrelation function Equation (4) an optimal band-width parameter h is needed to avoid two possibleartifacts First if h is very small then spatialcorrelations might be underestimated due to overfittingof the inhomogeneity of the density Second if h is toolarge then spatial correlations might be overestimatedsince first-order inhomogeneity is not adequatelyremoved from the second-order spatial statistics Wesolved this problem by computing the PCF deviationDg for the inhomogeneous point process for varyingvalues of the bandwidth h (Figure 2) Since theinhomogeneous point process generates uncorrelated

fixations ie gtheo(r) frac14 1 the optimal bandwidth forthe dataset corresponds to a minimum of the PCFdeviation Dg (quantifying the deviation from the idealvalue g(r)frac14 1) For image set 1 the optimal value wasestimated as h1 frac14 408 (Figure 2a)

Based on this density estimate we can compute theinhomogeneous pair correlation function ginhom(r) inwhich first-order inhomogeneity is removed from thesecond-order spatial correlations (see Methods) As aresult we obtained pair correlations from individualtrials (Figure 1g gray lines) Deviations from ginhom(r)rsquo 1 indicate spatial clustering at a specific distance rThe mean pair correlation function ginhom(r) providesevidence for clustering at small spatial scales with r 48 (Figure 1g red line) Such a scale is greater than thefoveal zone (r 28) and might provide an estimate ofthe size of the effective perceptual window in free sceneviewing This result is compatible with earlier findingsthat the zone of active selection of saccade targetsextends beyond the fovea into the parafovea up toeccentricities of 48 (Reinagel amp Zador 1999)

Next we carried out the same numerical computa-tions for two sets of surrogate data The surrogate datawere generated to test the null hypotheses of completespatial randomness both for an inhomogeneous pointprocess with position-dependent intensity k(x) and fora homogeneous point process with constant intensityk0 For the inhomogeneous point process we sampledfrom the estimated intensity k(x) (Figure 1b) whereasa constant intensity k0 obtained from spatial averag-ing was used for the homogeneous point process(Figure 1c) Both surrogate datasets are important forchecking the reliability of the computation of the paircorrelation function for the original data (Figure 1g)First the inhomogeneous point process gives a flat

Figure 2 Optimal bandwidth parameter for inhomogeneous pair correlation function (PCF) for three different image sets For

simulated data from the inhomogeneous point process the PCF deviation Dg Equation (5) was computed as a function of the

bandwidth h for the underlying kernel density estimate The optimal bandwidth corresponds to the position of the minimum of Dg (a)

For image set 1 h1frac14 408 (b) For image set 2 h2frac14 468 (c) For the Le Meur dataset h3frac14 228 due to a much smaller presentation

display compared with our experiments

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 5

mean correlation function with g(r) rsquo 1 (Figure 1h)which demonstrates the absence of clustering (exceptfor the divergence at very small scales as an effect ofnumerical computation issues) Thus the spatialcorrelations in the experimental data are not a simpleconsequence of spatial inhomogeneity Second theresult for the homogeneous point process (Figure 1h) isthe same as for the inhomogeneous point processwhich indicates that the correction for inhomogeneityneeded for computations in Figure 1g does notproduce unwanted artefacts due to possible overfittingof the spatial inhomogeneities We conclude that ourexperimental data give a clear indication for spatialclustering at length-scales smaller than 48 of visualangle Additionally we checked the hypothesis thatthis effect of spatial clustering might be due to saccadicundershoot and subsequent short correction saccadesby excluding all fixations with durations shorter than200 ms A related analysis of the PCF indicates noqualitative differences from the original data (Figure1h)

Results reported so far were obtained for a singlerepresentative image Over the full set of 15 images weanalyzed the 2D densities using a symmetrized form ofthe Kullback-Leibler divergence (KLD) based on theconcept of information gain (Beck amp Schlogl 1993)For the experimental data we applied a split-halfprocedure (first half of participants vs second half ofparticipants) and computed the KLD between the twoexperimental densities The corresponding KLD valuesdemonstrate that the inhomogeneous point processreproduces the 2D density (Figure 1f) while thehomogeneous point process clearly fails to approximatethe systematic inhomogeneity in the image Model typehad a significant effect on KLD v2(2)frac141054 p 001Contrasts revealed that (1) spatially inhomogeneousdata-sets were different from the homogeneous data bfrac14 0319 t(28) frac14 2285 p 001 and that (2)experimental data were significantly different frominhomogeneous surrogate data bfrac14 0062 t(28)frac14 256pfrac14 0016

Both sets of surrogate data were submitted to ananalysis of the pair correlation function where thedeviations from g(r) rsquo 1 were computed to obtain aPCF measure indicating the amount of spatial corre-lation averaged over distances (see Methods) Resultsindicate that the surrogate data producemdashas de-signedmdashuncorrelated gaze positions (low PCF devia-tion) while the experimental data by human observersexhibit spatially correlated gaze positions (Figure 1i)Model type had a significant effect on PCF v2(2)frac14984 p 001

To investigate the reliability of our finding of spatialclustering on short-length scales during natural sceneviewing we investigated two other datasets using thesame procedures with an optimal bandwidth adjusted

for each dataset (Figure 2andashc) First we compared themean pair correlation functions per image for naturalobject-based scenes (image set 1 Figure 3a d) andabstract natural patterns (image set 2 Figure 3b e)While variability of the pair correlation function mightbe greater for the abstract scenes than for the object-based scenes (Figure 3d e gray lines) the mean paircorrelation functions for the images sets (Figure 3d ered lines) are very similar This result indicates that theclustering on short length scales is a robust phenom-enon which does not seem to depend sensitively onscene content Second we analyzed a publicly availabledataset (Le Meur et al 2006) from the MIT saliencybenchmark (Bylinskii et al 2012 Judd Durand ampTorralba 2012) consisting of 40 participants whoviewed 27 color images for 15 s For these data thespatial scale of the presentation display for the images(Figure 3c) was considerably smaller than in ourexperiment Consequently we observed pair correla-tion functions that indicate clustering on smaller scales( 158 Figure 3f) Thus while scene content does notseem to exert a strong influence on spatial correlationsspatial scale of the image modulates the spatial scale ofthe clustering of fixations

We conclude that second-order spatial statisticsobtained for the experimental data are significantlydifferent from stochastic processes implementing theassumption of spatial randomness Furthermore themere presence of spatial inhomogeneity in the experi-mental data cannot explain by itself the observedspatial correlations which is evident in the results forthe inhomogeneous point process While inhibition-of-return (Klein 2000) has been discussed frequently asone of the key principles added to saliency maps forsaccadic selection (Itti amp Koch 2001) spatial clusteringof gaze positions is an additional statistical propertythat is highly informative on mechanisms of gazeplanning but has been neglected so far Next we usethese results to develop and test a dynamical model forsaccade generation that uses activation field dynamicsto reproduce spatial statistics of first- and second-order

A dynamical model of saccade generation

A key assumption for the model we propose is thecombination of two neural activation maps toimplement dynamical principles for saccadic selectionFirst a fixation map f(x y t) is keeping track of thesequence of fixations by inhibitory tagging (Itti ampKoch 2001) Second an attention map a(x y t) thatis driven by early visual processing controls thedistribution of attention Physiologically the as-sumption of the dynamical maps is supported by thepresence of an allocentric motor map of visual spacein the primate entorhinal cortex (Killian Jutras amp

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 6

Buffalo 2012) Moreover this map is spatiallydiscrete (Stensola et al 2012) and serves as abiological motivation for the fixation and attentionmaps in our model

We implemented activation maps for attention andfixation (inhibitory tagging) on a discrete square latticeof dimension L middot L Lattice points (i j) haveequidistant spatial positions (xi yj) for i j frac14 1 Lwhere xifrac14x0thorn iDx and yjfrac14y0thorn jDy As a consequenceattention and fixation maps are implemented inspatially discrete forms aij (t) and fij(t) respec-tively For the numerical simulations time wasdiscretized in steps of Dtfrac14 10 ms with tfrac14 k middot Dt and kfrac14 0 1 2 T

If the observerrsquos gaze is at position (xg yg) at time tthen a position-dependent activation change Fij(xg yg)and a global decay proportional to the currentactivation ndashxfij(t) are added to all lattice positions toupdate the activation map at time t thorn 1 ie

fijethtthorn 1THORN frac14 Fijethxg ygTHORN thorn eth1 xTHORNfijethtTHORN eth6THORN

where the activation change Fij(xg yg) rdquo Fij(t) isimplicitly time-dependent because of the time-depen-dence of gaze positions xg(t) yg(t) The constant x 1determines the strength of the decay of activation Forthe spatial distribution of the activation change Fij(t)

we assume a Gaussian profile ie

FijethtTHORN frac14R0ffiffiffiffiffiffi2pp

r0

exp ethxi xgethtTHORNTHORN2 thorn ethyj ygethtTHORNTHORN2

2r20

eth7THORNwith the free parameters r0 and R0 controlling thespatial extent of the activation change and the strengthof the activation change respectively In our model thebuild-up of activation in the fixation map is amechanism of inhibitory tagging (Itti amp Koch 2001) toreduce the amount of refixations on recently visitedimage patches

For the attention map aij(t) we assume similardynamics however the width of Gaussian activationchange Aij(t) is assumed to be proportional to the staticsaliency map ij The updating rule for the attentionmap is given by

aijethtthorn 1THORN frac14ijAijethtTHORNX

kl

klAklethtTHORNthorn eth1 qTHORNaijethtTHORN eth8THORN

with decay constant q 1 As a result the saliencymap ij is accessed locally through a Gaussian aperturewith size r1 and scale parameter R1 similar to Equation

Figure 3 Comparison of the mean pair correlation functions for three different image sets (a) Natural object-based scenes (b)

Abstract natural patterns (c) Natural scenes from the Le Meur et al dataset Mean pair correlation function are similar for object-

based (d) and abstract (e) natural scenes The presentation of images on a smaller display in the Le Meur dataset compared to our

experiments (f) resulted in a smaller length scale of spatial clustering

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 7

(7) Using the local read-out mechanism information isprovided for the attention map to identify regions ofinterest for eye guidance

The fixation map monitors recently visited fixationlocations by increasing local activation at the corre-sponding lattice points (Engbert et al 2011 Freund ampGrassberger 1992) If the observerrsquos gaze position is ata position corresponding to lattice position (i j) then aposition-dependent activation change Fij in the form ofa Gaussian profile is added locally in each time stepwhile a global decay proportional to the currentactivation is applied to all lattice positions The widthof the Gaussian activation r0 and the decay x are thetwo free parameters controlling activation in thefixation map For the attention map aij(t) we assumesimilar dynamics including local increase of activationwith size r1 and global decay q However the amountof activation change Aij is assumed to be proportionalto the time-independent saliency map ij so that thelocal increase of activation isthorn ijAij

PklAkl

Our modeling assumptions are related to specifichypotheses on model parameters We expect that thesize of the Gaussian profile for the attention map islarger than the corresponding size of the fixation mapr1 r0 since attention is the process driving eyemovements into new regions of visual space while theinhibitory tagging process should be more localized Asimilar expectation can be formulated on the decayconstants Since inhibitory tagging is needed on alonger time scale as a foraging facilitator we expect aslower decay in the fixation map compared to theattention map ie x q

Next we assume that given a saccade command attime t both maps are evaluated to select the nextsaccade target First we apply a normalization of bothattention and fixation maps as a general neuralprinciple to obtain relative activations (Carandini ampHeeger 2011) Second we introduce a potentialfunction as the difference of the normalized maps

uijethtTHORN frac14 aijethtTHORN$ kX

kl

aklethtTHORNfrac12 )kthorn

fijethtTHORN$ cX

kl

fklethtTHORNfrac12 )c eth9THORN

where the exponents k and c are free parametersHowever a value of kfrac14 1 is a necessary boundarycondition to obtain a model that accurately reproducesthe densities of gaze positions In a qualitative analysisof the model (see Appendix) pilot simulations showedthat c is an important control parameter determiningspatial correlations where c rsquo 03 was used toreproduce spatial correlations observed in our experi-mental data

The potential uij(t) Equation (9) can be positive ornegative at position (i j) Lattice positions with apositive potential uij 0 are excluded from saccadic

selection since corresponding regions were visitedrecently with high probability Among the latticepositions with negative activations we implementedstochastic selection of saccade targets proportional torelative activations also known as Lucersquos choice rule(Luce 1959) We implemented this form of stochasticselection from the set S frac14 fethi jTHORNjuij 0g where theprobability pij(t) to select lattice position (i j) at time tas the next saccade target is given by

pijethtTHORN frac14 maxuijethtTHORNX

ethklTHORNSuklethtTHORN

g

0

BB

1

CCA eth10THORN

The noise term g is an additional parametercontrolling the amount of noise in target selection

Numerical simulations of the model

Our computational modeling approach to saccadicselection has been developed to propose a minimalmodel that captures the types of spatial statisticsobserved in experimental data After fixing modelparameters all that is needed to run the model on aparticular image is a 2D density estimate of gazepatterns (from experimental data) or a corresponding2D prediction from one of the available saliency models(Borji amp Itti 2013) To reduce computational com-plexity in the current study and to exclude potentialmismatches between data and model simulations due tothe saliency models we run the model on experimen-tally realized densities of gaze positions which isequivalent to assuming an exact saliency modelHowever our modeling approach is compatible withfuture dynamical saliency models that provide time-and position-dependent saliency during a sequence ofgaze shifts thus our model introduces a generaldynamical framework and is not tied to using empiricaldata

The numerical values of the five model parameterswere estimated from experimental data recorded for thefirst five images of natural object-based scenes (imageset 1) using a genetic algorithm approach (Mitchell1998 see Appendix Table 1) The remaining 10 imagesof image set 1 were used for model evaluationsmdashas the15 images of image set 2 The objective function forparameter estimation was based on evaluation of first-order statistics (2D density of gaze positions) and thedistribution of saccade lengths In agreement with ourfirst expectation the estimated optimal values for thespatial extent of the inhibitory tagging process in thefixation map r0frac14 228 is considerably smaller than thecorresponding size of the build-up function for theattention map r1frac14 498 Our second expectation wasrelated to the decay constants which turned out to be

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 8

larger for the attention map qfrac14 0066 than for thefixation map xfrac14 93 middot 105 so q was greater than xagain as expected Finally the noise level in the targetmap is g frac14 91 middot 105

An example for the simulation of the modeldemonstrates the interplay between inhibitory pro-cesses from the fixation map and the attention mapduring gaze planning (Figure 4) The fixation mapbuilds up activation at fixated lattice positions (yellowto red) while the attention map identifies new regionsof interest for saccadic selection (blue) These simula-tions show on a qualitative level how the modelimplements the interplay of the assumed mechanisms of

inhibitory tagging and saccadic selection of gazepositions (see Supplementary Video)

To investigate model performance qualitatively weran simulations for one image (image 6 930 fixationFigure 5a) and obtained a number of fixations similarto the experimental data (882 fixations Figure 5b)Single-trial scanpaths from experiments and simula-tions are shown additionally in Figure 5a and 5brespectively) The resulting distributions of saccadelengths indicate that our dynamical model is in goodagreement with experimental data while the twosurrogate datasets (homogeneous and inhomogeneouspoint processes) fail to reproduce the distribution(Figure 5c) An analysis of the pair correlation

Figure 4 Illustration of a simulated sequence of gaze positions and the activation dynamics of the model (a) The density of gazepositions (empirical saliency) is used as a proxy for a computed saliency map that drives activation in the attention map (bndashf)Sequence of snapshots of the potential (bluefrac14 low yellowfrac14 high) Note that blue color indicates density of fixations in (a) while itrefers to low values of the potential in (b)ndash(f)

Parameter Symbol Mean Error Min Max Reference

Fixation mapActivation span [8] r0 216 011 03 100 Eq (7)Decay log10 x ndash403 028 ndash50 ndash10 Eq (6)

Attention mapActivation span [8] r1 488 025 03 100 Eq (8)Decay log10 q ndash118 008 ndash30 ndash10 Eq (8)

Target selectionAdditive noise log10 g ndash404 007 ndash90 ndash30 Eq (10)

Table 1 Model parameters

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 9

functions indicates that the spatial correlations presentin the experimental data were approximated by thedynamical model (Figure 5d) however the twosurrogate datasets representing uncorrelated sequencesby construction produce qualitatively different spatialcorrelations

To investigate the influences of saccade-lengthdistributions on pair correlations we constructedanother statistical control model that had access to theimage-specific saccade-length distribution It is impor-tant to note that this model was not introduced as acompetitor to the dynamical model which is able topredict saccade-length distributions The statisticalcontrol model approximated the distribution of saccadelengths l and 2D densities of gaze positions x bysampling from the joint probability distribution p(x l)under the assumption of statistical independence ofsaccade lengths and gaze positions ie p(x l)frac14p(x)p(l) This model by construction approximates thedistribution of saccade lengths and 2D density of gazepositions (Figure 5c) The simulations indicate how-

ever that even the combination of inhomogeneousdensity of gaze positions and non-normal distributionof saccade lengths used by the statistical control modelcannot explain spatial correlations in the experimentaldata characterized by the pair correlations function(Figure 5d)

For the statistical analysis of model performance onnew images we carried out additional numericalsimulations We fitted model parameters to dataobtained for the first five images only (see above) andpredicted data for the remaining 10 images by the newsimulations to isolate parameter estimation from modelevaluation (calculating test errors rather than trainingerrors)

Our simulations show that the dynamical modelpredicted the 2D density of gaze positions accurately(Figure 6a) The obtained KLD values for the model(blue) were comparable to KLD values calculated bythe split-half procedure for the experimental data (red)and to the KLD values obtained for the statisticalcontrol models (green frac14Homogeneous point process

Figure 5 Distribution of saccade lengths and pair correlation functions from model simulations (image 6) (a) Experimental

distribution of gaze positions (red dots) and a representative sample trial (red lines) (b) Corresponding plot of simulated data

obtained from our dynamical model (blue dots) A single-trial simulation is highlighted (blue line) (c) Distributions of saccade lengths

for experimental data (red) dynamical model (blue) homogeneous point process (green) inhomogeneous point process (yellow) and

a statistical control model (magenta) (d) Pair correlation functions for the different models The dynamical model (blue line) produces

spatial correlations similar to the experimental data (red line)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 10

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 6: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

mean correlation function with g(r) rsquo 1 (Figure 1h)which demonstrates the absence of clustering (exceptfor the divergence at very small scales as an effect ofnumerical computation issues) Thus the spatialcorrelations in the experimental data are not a simpleconsequence of spatial inhomogeneity Second theresult for the homogeneous point process (Figure 1h) isthe same as for the inhomogeneous point processwhich indicates that the correction for inhomogeneityneeded for computations in Figure 1g does notproduce unwanted artefacts due to possible overfittingof the spatial inhomogeneities We conclude that ourexperimental data give a clear indication for spatialclustering at length-scales smaller than 48 of visualangle Additionally we checked the hypothesis thatthis effect of spatial clustering might be due to saccadicundershoot and subsequent short correction saccadesby excluding all fixations with durations shorter than200 ms A related analysis of the PCF indicates noqualitative differences from the original data (Figure1h)

Results reported so far were obtained for a singlerepresentative image Over the full set of 15 images weanalyzed the 2D densities using a symmetrized form ofthe Kullback-Leibler divergence (KLD) based on theconcept of information gain (Beck amp Schlogl 1993)For the experimental data we applied a split-halfprocedure (first half of participants vs second half ofparticipants) and computed the KLD between the twoexperimental densities The corresponding KLD valuesdemonstrate that the inhomogeneous point processreproduces the 2D density (Figure 1f) while thehomogeneous point process clearly fails to approximatethe systematic inhomogeneity in the image Model typehad a significant effect on KLD v2(2)frac141054 p 001Contrasts revealed that (1) spatially inhomogeneousdata-sets were different from the homogeneous data bfrac14 0319 t(28) frac14 2285 p 001 and that (2)experimental data were significantly different frominhomogeneous surrogate data bfrac14 0062 t(28)frac14 256pfrac14 0016

Both sets of surrogate data were submitted to ananalysis of the pair correlation function where thedeviations from g(r) rsquo 1 were computed to obtain aPCF measure indicating the amount of spatial corre-lation averaged over distances (see Methods) Resultsindicate that the surrogate data producemdashas de-signedmdashuncorrelated gaze positions (low PCF devia-tion) while the experimental data by human observersexhibit spatially correlated gaze positions (Figure 1i)Model type had a significant effect on PCF v2(2)frac14984 p 001

To investigate the reliability of our finding of spatialclustering on short-length scales during natural sceneviewing we investigated two other datasets using thesame procedures with an optimal bandwidth adjusted

for each dataset (Figure 2andashc) First we compared themean pair correlation functions per image for naturalobject-based scenes (image set 1 Figure 3a d) andabstract natural patterns (image set 2 Figure 3b e)While variability of the pair correlation function mightbe greater for the abstract scenes than for the object-based scenes (Figure 3d e gray lines) the mean paircorrelation functions for the images sets (Figure 3d ered lines) are very similar This result indicates that theclustering on short length scales is a robust phenom-enon which does not seem to depend sensitively onscene content Second we analyzed a publicly availabledataset (Le Meur et al 2006) from the MIT saliencybenchmark (Bylinskii et al 2012 Judd Durand ampTorralba 2012) consisting of 40 participants whoviewed 27 color images for 15 s For these data thespatial scale of the presentation display for the images(Figure 3c) was considerably smaller than in ourexperiment Consequently we observed pair correla-tion functions that indicate clustering on smaller scales( 158 Figure 3f) Thus while scene content does notseem to exert a strong influence on spatial correlationsspatial scale of the image modulates the spatial scale ofthe clustering of fixations

We conclude that second-order spatial statisticsobtained for the experimental data are significantlydifferent from stochastic processes implementing theassumption of spatial randomness Furthermore themere presence of spatial inhomogeneity in the experi-mental data cannot explain by itself the observedspatial correlations which is evident in the results forthe inhomogeneous point process While inhibition-of-return (Klein 2000) has been discussed frequently asone of the key principles added to saliency maps forsaccadic selection (Itti amp Koch 2001) spatial clusteringof gaze positions is an additional statistical propertythat is highly informative on mechanisms of gazeplanning but has been neglected so far Next we usethese results to develop and test a dynamical model forsaccade generation that uses activation field dynamicsto reproduce spatial statistics of first- and second-order

A dynamical model of saccade generation

A key assumption for the model we propose is thecombination of two neural activation maps toimplement dynamical principles for saccadic selectionFirst a fixation map f(x y t) is keeping track of thesequence of fixations by inhibitory tagging (Itti ampKoch 2001) Second an attention map a(x y t) thatis driven by early visual processing controls thedistribution of attention Physiologically the as-sumption of the dynamical maps is supported by thepresence of an allocentric motor map of visual spacein the primate entorhinal cortex (Killian Jutras amp

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 6

Buffalo 2012) Moreover this map is spatiallydiscrete (Stensola et al 2012) and serves as abiological motivation for the fixation and attentionmaps in our model

We implemented activation maps for attention andfixation (inhibitory tagging) on a discrete square latticeof dimension L middot L Lattice points (i j) haveequidistant spatial positions (xi yj) for i j frac14 1 Lwhere xifrac14x0thorn iDx and yjfrac14y0thorn jDy As a consequenceattention and fixation maps are implemented inspatially discrete forms aij (t) and fij(t) respec-tively For the numerical simulations time wasdiscretized in steps of Dtfrac14 10 ms with tfrac14 k middot Dt and kfrac14 0 1 2 T

If the observerrsquos gaze is at position (xg yg) at time tthen a position-dependent activation change Fij(xg yg)and a global decay proportional to the currentactivation ndashxfij(t) are added to all lattice positions toupdate the activation map at time t thorn 1 ie

fijethtthorn 1THORN frac14 Fijethxg ygTHORN thorn eth1 xTHORNfijethtTHORN eth6THORN

where the activation change Fij(xg yg) rdquo Fij(t) isimplicitly time-dependent because of the time-depen-dence of gaze positions xg(t) yg(t) The constant x 1determines the strength of the decay of activation Forthe spatial distribution of the activation change Fij(t)

we assume a Gaussian profile ie

FijethtTHORN frac14R0ffiffiffiffiffiffi2pp

r0

exp ethxi xgethtTHORNTHORN2 thorn ethyj ygethtTHORNTHORN2

2r20

eth7THORNwith the free parameters r0 and R0 controlling thespatial extent of the activation change and the strengthof the activation change respectively In our model thebuild-up of activation in the fixation map is amechanism of inhibitory tagging (Itti amp Koch 2001) toreduce the amount of refixations on recently visitedimage patches

For the attention map aij(t) we assume similardynamics however the width of Gaussian activationchange Aij(t) is assumed to be proportional to the staticsaliency map ij The updating rule for the attentionmap is given by

aijethtthorn 1THORN frac14ijAijethtTHORNX

kl

klAklethtTHORNthorn eth1 qTHORNaijethtTHORN eth8THORN

with decay constant q 1 As a result the saliencymap ij is accessed locally through a Gaussian aperturewith size r1 and scale parameter R1 similar to Equation

Figure 3 Comparison of the mean pair correlation functions for three different image sets (a) Natural object-based scenes (b)

Abstract natural patterns (c) Natural scenes from the Le Meur et al dataset Mean pair correlation function are similar for object-

based (d) and abstract (e) natural scenes The presentation of images on a smaller display in the Le Meur dataset compared to our

experiments (f) resulted in a smaller length scale of spatial clustering

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 7

(7) Using the local read-out mechanism information isprovided for the attention map to identify regions ofinterest for eye guidance

The fixation map monitors recently visited fixationlocations by increasing local activation at the corre-sponding lattice points (Engbert et al 2011 Freund ampGrassberger 1992) If the observerrsquos gaze position is ata position corresponding to lattice position (i j) then aposition-dependent activation change Fij in the form ofa Gaussian profile is added locally in each time stepwhile a global decay proportional to the currentactivation is applied to all lattice positions The widthof the Gaussian activation r0 and the decay x are thetwo free parameters controlling activation in thefixation map For the attention map aij(t) we assumesimilar dynamics including local increase of activationwith size r1 and global decay q However the amountof activation change Aij is assumed to be proportionalto the time-independent saliency map ij so that thelocal increase of activation isthorn ijAij

PklAkl

Our modeling assumptions are related to specifichypotheses on model parameters We expect that thesize of the Gaussian profile for the attention map islarger than the corresponding size of the fixation mapr1 r0 since attention is the process driving eyemovements into new regions of visual space while theinhibitory tagging process should be more localized Asimilar expectation can be formulated on the decayconstants Since inhibitory tagging is needed on alonger time scale as a foraging facilitator we expect aslower decay in the fixation map compared to theattention map ie x q

Next we assume that given a saccade command attime t both maps are evaluated to select the nextsaccade target First we apply a normalization of bothattention and fixation maps as a general neuralprinciple to obtain relative activations (Carandini ampHeeger 2011) Second we introduce a potentialfunction as the difference of the normalized maps

uijethtTHORN frac14 aijethtTHORN$ kX

kl

aklethtTHORNfrac12 )kthorn

fijethtTHORN$ cX

kl

fklethtTHORNfrac12 )c eth9THORN

where the exponents k and c are free parametersHowever a value of kfrac14 1 is a necessary boundarycondition to obtain a model that accurately reproducesthe densities of gaze positions In a qualitative analysisof the model (see Appendix) pilot simulations showedthat c is an important control parameter determiningspatial correlations where c rsquo 03 was used toreproduce spatial correlations observed in our experi-mental data

The potential uij(t) Equation (9) can be positive ornegative at position (i j) Lattice positions with apositive potential uij 0 are excluded from saccadic

selection since corresponding regions were visitedrecently with high probability Among the latticepositions with negative activations we implementedstochastic selection of saccade targets proportional torelative activations also known as Lucersquos choice rule(Luce 1959) We implemented this form of stochasticselection from the set S frac14 fethi jTHORNjuij 0g where theprobability pij(t) to select lattice position (i j) at time tas the next saccade target is given by

pijethtTHORN frac14 maxuijethtTHORNX

ethklTHORNSuklethtTHORN

g

0

BB

1

CCA eth10THORN

The noise term g is an additional parametercontrolling the amount of noise in target selection

Numerical simulations of the model

Our computational modeling approach to saccadicselection has been developed to propose a minimalmodel that captures the types of spatial statisticsobserved in experimental data After fixing modelparameters all that is needed to run the model on aparticular image is a 2D density estimate of gazepatterns (from experimental data) or a corresponding2D prediction from one of the available saliency models(Borji amp Itti 2013) To reduce computational com-plexity in the current study and to exclude potentialmismatches between data and model simulations due tothe saliency models we run the model on experimen-tally realized densities of gaze positions which isequivalent to assuming an exact saliency modelHowever our modeling approach is compatible withfuture dynamical saliency models that provide time-and position-dependent saliency during a sequence ofgaze shifts thus our model introduces a generaldynamical framework and is not tied to using empiricaldata

The numerical values of the five model parameterswere estimated from experimental data recorded for thefirst five images of natural object-based scenes (imageset 1) using a genetic algorithm approach (Mitchell1998 see Appendix Table 1) The remaining 10 imagesof image set 1 were used for model evaluationsmdashas the15 images of image set 2 The objective function forparameter estimation was based on evaluation of first-order statistics (2D density of gaze positions) and thedistribution of saccade lengths In agreement with ourfirst expectation the estimated optimal values for thespatial extent of the inhibitory tagging process in thefixation map r0frac14 228 is considerably smaller than thecorresponding size of the build-up function for theattention map r1frac14 498 Our second expectation wasrelated to the decay constants which turned out to be

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 8

larger for the attention map qfrac14 0066 than for thefixation map xfrac14 93 middot 105 so q was greater than xagain as expected Finally the noise level in the targetmap is g frac14 91 middot 105

An example for the simulation of the modeldemonstrates the interplay between inhibitory pro-cesses from the fixation map and the attention mapduring gaze planning (Figure 4) The fixation mapbuilds up activation at fixated lattice positions (yellowto red) while the attention map identifies new regionsof interest for saccadic selection (blue) These simula-tions show on a qualitative level how the modelimplements the interplay of the assumed mechanisms of

inhibitory tagging and saccadic selection of gazepositions (see Supplementary Video)

To investigate model performance qualitatively weran simulations for one image (image 6 930 fixationFigure 5a) and obtained a number of fixations similarto the experimental data (882 fixations Figure 5b)Single-trial scanpaths from experiments and simula-tions are shown additionally in Figure 5a and 5brespectively) The resulting distributions of saccadelengths indicate that our dynamical model is in goodagreement with experimental data while the twosurrogate datasets (homogeneous and inhomogeneouspoint processes) fail to reproduce the distribution(Figure 5c) An analysis of the pair correlation

Figure 4 Illustration of a simulated sequence of gaze positions and the activation dynamics of the model (a) The density of gazepositions (empirical saliency) is used as a proxy for a computed saliency map that drives activation in the attention map (bndashf)Sequence of snapshots of the potential (bluefrac14 low yellowfrac14 high) Note that blue color indicates density of fixations in (a) while itrefers to low values of the potential in (b)ndash(f)

Parameter Symbol Mean Error Min Max Reference

Fixation mapActivation span [8] r0 216 011 03 100 Eq (7)Decay log10 x ndash403 028 ndash50 ndash10 Eq (6)

Attention mapActivation span [8] r1 488 025 03 100 Eq (8)Decay log10 q ndash118 008 ndash30 ndash10 Eq (8)

Target selectionAdditive noise log10 g ndash404 007 ndash90 ndash30 Eq (10)

Table 1 Model parameters

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 9

functions indicates that the spatial correlations presentin the experimental data were approximated by thedynamical model (Figure 5d) however the twosurrogate datasets representing uncorrelated sequencesby construction produce qualitatively different spatialcorrelations

To investigate the influences of saccade-lengthdistributions on pair correlations we constructedanother statistical control model that had access to theimage-specific saccade-length distribution It is impor-tant to note that this model was not introduced as acompetitor to the dynamical model which is able topredict saccade-length distributions The statisticalcontrol model approximated the distribution of saccadelengths l and 2D densities of gaze positions x bysampling from the joint probability distribution p(x l)under the assumption of statistical independence ofsaccade lengths and gaze positions ie p(x l)frac14p(x)p(l) This model by construction approximates thedistribution of saccade lengths and 2D density of gazepositions (Figure 5c) The simulations indicate how-

ever that even the combination of inhomogeneousdensity of gaze positions and non-normal distributionof saccade lengths used by the statistical control modelcannot explain spatial correlations in the experimentaldata characterized by the pair correlations function(Figure 5d)

For the statistical analysis of model performance onnew images we carried out additional numericalsimulations We fitted model parameters to dataobtained for the first five images only (see above) andpredicted data for the remaining 10 images by the newsimulations to isolate parameter estimation from modelevaluation (calculating test errors rather than trainingerrors)

Our simulations show that the dynamical modelpredicted the 2D density of gaze positions accurately(Figure 6a) The obtained KLD values for the model(blue) were comparable to KLD values calculated bythe split-half procedure for the experimental data (red)and to the KLD values obtained for the statisticalcontrol models (green frac14Homogeneous point process

Figure 5 Distribution of saccade lengths and pair correlation functions from model simulations (image 6) (a) Experimental

distribution of gaze positions (red dots) and a representative sample trial (red lines) (b) Corresponding plot of simulated data

obtained from our dynamical model (blue dots) A single-trial simulation is highlighted (blue line) (c) Distributions of saccade lengths

for experimental data (red) dynamical model (blue) homogeneous point process (green) inhomogeneous point process (yellow) and

a statistical control model (magenta) (d) Pair correlation functions for the different models The dynamical model (blue line) produces

spatial correlations similar to the experimental data (red line)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 10

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 7: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

Buffalo 2012) Moreover this map is spatiallydiscrete (Stensola et al 2012) and serves as abiological motivation for the fixation and attentionmaps in our model

We implemented activation maps for attention andfixation (inhibitory tagging) on a discrete square latticeof dimension L middot L Lattice points (i j) haveequidistant spatial positions (xi yj) for i j frac14 1 Lwhere xifrac14x0thorn iDx and yjfrac14y0thorn jDy As a consequenceattention and fixation maps are implemented inspatially discrete forms aij (t) and fij(t) respec-tively For the numerical simulations time wasdiscretized in steps of Dtfrac14 10 ms with tfrac14 k middot Dt and kfrac14 0 1 2 T

If the observerrsquos gaze is at position (xg yg) at time tthen a position-dependent activation change Fij(xg yg)and a global decay proportional to the currentactivation ndashxfij(t) are added to all lattice positions toupdate the activation map at time t thorn 1 ie

fijethtthorn 1THORN frac14 Fijethxg ygTHORN thorn eth1 xTHORNfijethtTHORN eth6THORN

where the activation change Fij(xg yg) rdquo Fij(t) isimplicitly time-dependent because of the time-depen-dence of gaze positions xg(t) yg(t) The constant x 1determines the strength of the decay of activation Forthe spatial distribution of the activation change Fij(t)

we assume a Gaussian profile ie

FijethtTHORN frac14R0ffiffiffiffiffiffi2pp

r0

exp ethxi xgethtTHORNTHORN2 thorn ethyj ygethtTHORNTHORN2

2r20

eth7THORNwith the free parameters r0 and R0 controlling thespatial extent of the activation change and the strengthof the activation change respectively In our model thebuild-up of activation in the fixation map is amechanism of inhibitory tagging (Itti amp Koch 2001) toreduce the amount of refixations on recently visitedimage patches

For the attention map aij(t) we assume similardynamics however the width of Gaussian activationchange Aij(t) is assumed to be proportional to the staticsaliency map ij The updating rule for the attentionmap is given by

aijethtthorn 1THORN frac14ijAijethtTHORNX

kl

klAklethtTHORNthorn eth1 qTHORNaijethtTHORN eth8THORN

with decay constant q 1 As a result the saliencymap ij is accessed locally through a Gaussian aperturewith size r1 and scale parameter R1 similar to Equation

Figure 3 Comparison of the mean pair correlation functions for three different image sets (a) Natural object-based scenes (b)

Abstract natural patterns (c) Natural scenes from the Le Meur et al dataset Mean pair correlation function are similar for object-

based (d) and abstract (e) natural scenes The presentation of images on a smaller display in the Le Meur dataset compared to our

experiments (f) resulted in a smaller length scale of spatial clustering

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 7

(7) Using the local read-out mechanism information isprovided for the attention map to identify regions ofinterest for eye guidance

The fixation map monitors recently visited fixationlocations by increasing local activation at the corre-sponding lattice points (Engbert et al 2011 Freund ampGrassberger 1992) If the observerrsquos gaze position is ata position corresponding to lattice position (i j) then aposition-dependent activation change Fij in the form ofa Gaussian profile is added locally in each time stepwhile a global decay proportional to the currentactivation is applied to all lattice positions The widthof the Gaussian activation r0 and the decay x are thetwo free parameters controlling activation in thefixation map For the attention map aij(t) we assumesimilar dynamics including local increase of activationwith size r1 and global decay q However the amountof activation change Aij is assumed to be proportionalto the time-independent saliency map ij so that thelocal increase of activation isthorn ijAij

PklAkl

Our modeling assumptions are related to specifichypotheses on model parameters We expect that thesize of the Gaussian profile for the attention map islarger than the corresponding size of the fixation mapr1 r0 since attention is the process driving eyemovements into new regions of visual space while theinhibitory tagging process should be more localized Asimilar expectation can be formulated on the decayconstants Since inhibitory tagging is needed on alonger time scale as a foraging facilitator we expect aslower decay in the fixation map compared to theattention map ie x q

Next we assume that given a saccade command attime t both maps are evaluated to select the nextsaccade target First we apply a normalization of bothattention and fixation maps as a general neuralprinciple to obtain relative activations (Carandini ampHeeger 2011) Second we introduce a potentialfunction as the difference of the normalized maps

uijethtTHORN frac14 aijethtTHORN$ kX

kl

aklethtTHORNfrac12 )kthorn

fijethtTHORN$ cX

kl

fklethtTHORNfrac12 )c eth9THORN

where the exponents k and c are free parametersHowever a value of kfrac14 1 is a necessary boundarycondition to obtain a model that accurately reproducesthe densities of gaze positions In a qualitative analysisof the model (see Appendix) pilot simulations showedthat c is an important control parameter determiningspatial correlations where c rsquo 03 was used toreproduce spatial correlations observed in our experi-mental data

The potential uij(t) Equation (9) can be positive ornegative at position (i j) Lattice positions with apositive potential uij 0 are excluded from saccadic

selection since corresponding regions were visitedrecently with high probability Among the latticepositions with negative activations we implementedstochastic selection of saccade targets proportional torelative activations also known as Lucersquos choice rule(Luce 1959) We implemented this form of stochasticselection from the set S frac14 fethi jTHORNjuij 0g where theprobability pij(t) to select lattice position (i j) at time tas the next saccade target is given by

pijethtTHORN frac14 maxuijethtTHORNX

ethklTHORNSuklethtTHORN

g

0

BB

1

CCA eth10THORN

The noise term g is an additional parametercontrolling the amount of noise in target selection

Numerical simulations of the model

Our computational modeling approach to saccadicselection has been developed to propose a minimalmodel that captures the types of spatial statisticsobserved in experimental data After fixing modelparameters all that is needed to run the model on aparticular image is a 2D density estimate of gazepatterns (from experimental data) or a corresponding2D prediction from one of the available saliency models(Borji amp Itti 2013) To reduce computational com-plexity in the current study and to exclude potentialmismatches between data and model simulations due tothe saliency models we run the model on experimen-tally realized densities of gaze positions which isequivalent to assuming an exact saliency modelHowever our modeling approach is compatible withfuture dynamical saliency models that provide time-and position-dependent saliency during a sequence ofgaze shifts thus our model introduces a generaldynamical framework and is not tied to using empiricaldata

The numerical values of the five model parameterswere estimated from experimental data recorded for thefirst five images of natural object-based scenes (imageset 1) using a genetic algorithm approach (Mitchell1998 see Appendix Table 1) The remaining 10 imagesof image set 1 were used for model evaluationsmdashas the15 images of image set 2 The objective function forparameter estimation was based on evaluation of first-order statistics (2D density of gaze positions) and thedistribution of saccade lengths In agreement with ourfirst expectation the estimated optimal values for thespatial extent of the inhibitory tagging process in thefixation map r0frac14 228 is considerably smaller than thecorresponding size of the build-up function for theattention map r1frac14 498 Our second expectation wasrelated to the decay constants which turned out to be

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 8

larger for the attention map qfrac14 0066 than for thefixation map xfrac14 93 middot 105 so q was greater than xagain as expected Finally the noise level in the targetmap is g frac14 91 middot 105

An example for the simulation of the modeldemonstrates the interplay between inhibitory pro-cesses from the fixation map and the attention mapduring gaze planning (Figure 4) The fixation mapbuilds up activation at fixated lattice positions (yellowto red) while the attention map identifies new regionsof interest for saccadic selection (blue) These simula-tions show on a qualitative level how the modelimplements the interplay of the assumed mechanisms of

inhibitory tagging and saccadic selection of gazepositions (see Supplementary Video)

To investigate model performance qualitatively weran simulations for one image (image 6 930 fixationFigure 5a) and obtained a number of fixations similarto the experimental data (882 fixations Figure 5b)Single-trial scanpaths from experiments and simula-tions are shown additionally in Figure 5a and 5brespectively) The resulting distributions of saccadelengths indicate that our dynamical model is in goodagreement with experimental data while the twosurrogate datasets (homogeneous and inhomogeneouspoint processes) fail to reproduce the distribution(Figure 5c) An analysis of the pair correlation

Figure 4 Illustration of a simulated sequence of gaze positions and the activation dynamics of the model (a) The density of gazepositions (empirical saliency) is used as a proxy for a computed saliency map that drives activation in the attention map (bndashf)Sequence of snapshots of the potential (bluefrac14 low yellowfrac14 high) Note that blue color indicates density of fixations in (a) while itrefers to low values of the potential in (b)ndash(f)

Parameter Symbol Mean Error Min Max Reference

Fixation mapActivation span [8] r0 216 011 03 100 Eq (7)Decay log10 x ndash403 028 ndash50 ndash10 Eq (6)

Attention mapActivation span [8] r1 488 025 03 100 Eq (8)Decay log10 q ndash118 008 ndash30 ndash10 Eq (8)

Target selectionAdditive noise log10 g ndash404 007 ndash90 ndash30 Eq (10)

Table 1 Model parameters

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 9

functions indicates that the spatial correlations presentin the experimental data were approximated by thedynamical model (Figure 5d) however the twosurrogate datasets representing uncorrelated sequencesby construction produce qualitatively different spatialcorrelations

To investigate the influences of saccade-lengthdistributions on pair correlations we constructedanother statistical control model that had access to theimage-specific saccade-length distribution It is impor-tant to note that this model was not introduced as acompetitor to the dynamical model which is able topredict saccade-length distributions The statisticalcontrol model approximated the distribution of saccadelengths l and 2D densities of gaze positions x bysampling from the joint probability distribution p(x l)under the assumption of statistical independence ofsaccade lengths and gaze positions ie p(x l)frac14p(x)p(l) This model by construction approximates thedistribution of saccade lengths and 2D density of gazepositions (Figure 5c) The simulations indicate how-

ever that even the combination of inhomogeneousdensity of gaze positions and non-normal distributionof saccade lengths used by the statistical control modelcannot explain spatial correlations in the experimentaldata characterized by the pair correlations function(Figure 5d)

For the statistical analysis of model performance onnew images we carried out additional numericalsimulations We fitted model parameters to dataobtained for the first five images only (see above) andpredicted data for the remaining 10 images by the newsimulations to isolate parameter estimation from modelevaluation (calculating test errors rather than trainingerrors)

Our simulations show that the dynamical modelpredicted the 2D density of gaze positions accurately(Figure 6a) The obtained KLD values for the model(blue) were comparable to KLD values calculated bythe split-half procedure for the experimental data (red)and to the KLD values obtained for the statisticalcontrol models (green frac14Homogeneous point process

Figure 5 Distribution of saccade lengths and pair correlation functions from model simulations (image 6) (a) Experimental

distribution of gaze positions (red dots) and a representative sample trial (red lines) (b) Corresponding plot of simulated data

obtained from our dynamical model (blue dots) A single-trial simulation is highlighted (blue line) (c) Distributions of saccade lengths

for experimental data (red) dynamical model (blue) homogeneous point process (green) inhomogeneous point process (yellow) and

a statistical control model (magenta) (d) Pair correlation functions for the different models The dynamical model (blue line) produces

spatial correlations similar to the experimental data (red line)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 10

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 8: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

(7) Using the local read-out mechanism information isprovided for the attention map to identify regions ofinterest for eye guidance

The fixation map monitors recently visited fixationlocations by increasing local activation at the corre-sponding lattice points (Engbert et al 2011 Freund ampGrassberger 1992) If the observerrsquos gaze position is ata position corresponding to lattice position (i j) then aposition-dependent activation change Fij in the form ofa Gaussian profile is added locally in each time stepwhile a global decay proportional to the currentactivation is applied to all lattice positions The widthof the Gaussian activation r0 and the decay x are thetwo free parameters controlling activation in thefixation map For the attention map aij(t) we assumesimilar dynamics including local increase of activationwith size r1 and global decay q However the amountof activation change Aij is assumed to be proportionalto the time-independent saliency map ij so that thelocal increase of activation isthorn ijAij

PklAkl

Our modeling assumptions are related to specifichypotheses on model parameters We expect that thesize of the Gaussian profile for the attention map islarger than the corresponding size of the fixation mapr1 r0 since attention is the process driving eyemovements into new regions of visual space while theinhibitory tagging process should be more localized Asimilar expectation can be formulated on the decayconstants Since inhibitory tagging is needed on alonger time scale as a foraging facilitator we expect aslower decay in the fixation map compared to theattention map ie x q

Next we assume that given a saccade command attime t both maps are evaluated to select the nextsaccade target First we apply a normalization of bothattention and fixation maps as a general neuralprinciple to obtain relative activations (Carandini ampHeeger 2011) Second we introduce a potentialfunction as the difference of the normalized maps

uijethtTHORN frac14 aijethtTHORN$ kX

kl

aklethtTHORNfrac12 )kthorn

fijethtTHORN$ cX

kl

fklethtTHORNfrac12 )c eth9THORN

where the exponents k and c are free parametersHowever a value of kfrac14 1 is a necessary boundarycondition to obtain a model that accurately reproducesthe densities of gaze positions In a qualitative analysisof the model (see Appendix) pilot simulations showedthat c is an important control parameter determiningspatial correlations where c rsquo 03 was used toreproduce spatial correlations observed in our experi-mental data

The potential uij(t) Equation (9) can be positive ornegative at position (i j) Lattice positions with apositive potential uij 0 are excluded from saccadic

selection since corresponding regions were visitedrecently with high probability Among the latticepositions with negative activations we implementedstochastic selection of saccade targets proportional torelative activations also known as Lucersquos choice rule(Luce 1959) We implemented this form of stochasticselection from the set S frac14 fethi jTHORNjuij 0g where theprobability pij(t) to select lattice position (i j) at time tas the next saccade target is given by

pijethtTHORN frac14 maxuijethtTHORNX

ethklTHORNSuklethtTHORN

g

0

BB

1

CCA eth10THORN

The noise term g is an additional parametercontrolling the amount of noise in target selection

Numerical simulations of the model

Our computational modeling approach to saccadicselection has been developed to propose a minimalmodel that captures the types of spatial statisticsobserved in experimental data After fixing modelparameters all that is needed to run the model on aparticular image is a 2D density estimate of gazepatterns (from experimental data) or a corresponding2D prediction from one of the available saliency models(Borji amp Itti 2013) To reduce computational com-plexity in the current study and to exclude potentialmismatches between data and model simulations due tothe saliency models we run the model on experimen-tally realized densities of gaze positions which isequivalent to assuming an exact saliency modelHowever our modeling approach is compatible withfuture dynamical saliency models that provide time-and position-dependent saliency during a sequence ofgaze shifts thus our model introduces a generaldynamical framework and is not tied to using empiricaldata

The numerical values of the five model parameterswere estimated from experimental data recorded for thefirst five images of natural object-based scenes (imageset 1) using a genetic algorithm approach (Mitchell1998 see Appendix Table 1) The remaining 10 imagesof image set 1 were used for model evaluationsmdashas the15 images of image set 2 The objective function forparameter estimation was based on evaluation of first-order statistics (2D density of gaze positions) and thedistribution of saccade lengths In agreement with ourfirst expectation the estimated optimal values for thespatial extent of the inhibitory tagging process in thefixation map r0frac14 228 is considerably smaller than thecorresponding size of the build-up function for theattention map r1frac14 498 Our second expectation wasrelated to the decay constants which turned out to be

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 8

larger for the attention map qfrac14 0066 than for thefixation map xfrac14 93 middot 105 so q was greater than xagain as expected Finally the noise level in the targetmap is g frac14 91 middot 105

An example for the simulation of the modeldemonstrates the interplay between inhibitory pro-cesses from the fixation map and the attention mapduring gaze planning (Figure 4) The fixation mapbuilds up activation at fixated lattice positions (yellowto red) while the attention map identifies new regionsof interest for saccadic selection (blue) These simula-tions show on a qualitative level how the modelimplements the interplay of the assumed mechanisms of

inhibitory tagging and saccadic selection of gazepositions (see Supplementary Video)

To investigate model performance qualitatively weran simulations for one image (image 6 930 fixationFigure 5a) and obtained a number of fixations similarto the experimental data (882 fixations Figure 5b)Single-trial scanpaths from experiments and simula-tions are shown additionally in Figure 5a and 5brespectively) The resulting distributions of saccadelengths indicate that our dynamical model is in goodagreement with experimental data while the twosurrogate datasets (homogeneous and inhomogeneouspoint processes) fail to reproduce the distribution(Figure 5c) An analysis of the pair correlation

Figure 4 Illustration of a simulated sequence of gaze positions and the activation dynamics of the model (a) The density of gazepositions (empirical saliency) is used as a proxy for a computed saliency map that drives activation in the attention map (bndashf)Sequence of snapshots of the potential (bluefrac14 low yellowfrac14 high) Note that blue color indicates density of fixations in (a) while itrefers to low values of the potential in (b)ndash(f)

Parameter Symbol Mean Error Min Max Reference

Fixation mapActivation span [8] r0 216 011 03 100 Eq (7)Decay log10 x ndash403 028 ndash50 ndash10 Eq (6)

Attention mapActivation span [8] r1 488 025 03 100 Eq (8)Decay log10 q ndash118 008 ndash30 ndash10 Eq (8)

Target selectionAdditive noise log10 g ndash404 007 ndash90 ndash30 Eq (10)

Table 1 Model parameters

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 9

functions indicates that the spatial correlations presentin the experimental data were approximated by thedynamical model (Figure 5d) however the twosurrogate datasets representing uncorrelated sequencesby construction produce qualitatively different spatialcorrelations

To investigate the influences of saccade-lengthdistributions on pair correlations we constructedanother statistical control model that had access to theimage-specific saccade-length distribution It is impor-tant to note that this model was not introduced as acompetitor to the dynamical model which is able topredict saccade-length distributions The statisticalcontrol model approximated the distribution of saccadelengths l and 2D densities of gaze positions x bysampling from the joint probability distribution p(x l)under the assumption of statistical independence ofsaccade lengths and gaze positions ie p(x l)frac14p(x)p(l) This model by construction approximates thedistribution of saccade lengths and 2D density of gazepositions (Figure 5c) The simulations indicate how-

ever that even the combination of inhomogeneousdensity of gaze positions and non-normal distributionof saccade lengths used by the statistical control modelcannot explain spatial correlations in the experimentaldata characterized by the pair correlations function(Figure 5d)

For the statistical analysis of model performance onnew images we carried out additional numericalsimulations We fitted model parameters to dataobtained for the first five images only (see above) andpredicted data for the remaining 10 images by the newsimulations to isolate parameter estimation from modelevaluation (calculating test errors rather than trainingerrors)

Our simulations show that the dynamical modelpredicted the 2D density of gaze positions accurately(Figure 6a) The obtained KLD values for the model(blue) were comparable to KLD values calculated bythe split-half procedure for the experimental data (red)and to the KLD values obtained for the statisticalcontrol models (green frac14Homogeneous point process

Figure 5 Distribution of saccade lengths and pair correlation functions from model simulations (image 6) (a) Experimental

distribution of gaze positions (red dots) and a representative sample trial (red lines) (b) Corresponding plot of simulated data

obtained from our dynamical model (blue dots) A single-trial simulation is highlighted (blue line) (c) Distributions of saccade lengths

for experimental data (red) dynamical model (blue) homogeneous point process (green) inhomogeneous point process (yellow) and

a statistical control model (magenta) (d) Pair correlation functions for the different models The dynamical model (blue line) produces

spatial correlations similar to the experimental data (red line)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 10

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 9: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

larger for the attention map qfrac14 0066 than for thefixation map xfrac14 93 middot 105 so q was greater than xagain as expected Finally the noise level in the targetmap is g frac14 91 middot 105

An example for the simulation of the modeldemonstrates the interplay between inhibitory pro-cesses from the fixation map and the attention mapduring gaze planning (Figure 4) The fixation mapbuilds up activation at fixated lattice positions (yellowto red) while the attention map identifies new regionsof interest for saccadic selection (blue) These simula-tions show on a qualitative level how the modelimplements the interplay of the assumed mechanisms of

inhibitory tagging and saccadic selection of gazepositions (see Supplementary Video)

To investigate model performance qualitatively weran simulations for one image (image 6 930 fixationFigure 5a) and obtained a number of fixations similarto the experimental data (882 fixations Figure 5b)Single-trial scanpaths from experiments and simula-tions are shown additionally in Figure 5a and 5brespectively) The resulting distributions of saccadelengths indicate that our dynamical model is in goodagreement with experimental data while the twosurrogate datasets (homogeneous and inhomogeneouspoint processes) fail to reproduce the distribution(Figure 5c) An analysis of the pair correlation

Figure 4 Illustration of a simulated sequence of gaze positions and the activation dynamics of the model (a) The density of gazepositions (empirical saliency) is used as a proxy for a computed saliency map that drives activation in the attention map (bndashf)Sequence of snapshots of the potential (bluefrac14 low yellowfrac14 high) Note that blue color indicates density of fixations in (a) while itrefers to low values of the potential in (b)ndash(f)

Parameter Symbol Mean Error Min Max Reference

Fixation mapActivation span [8] r0 216 011 03 100 Eq (7)Decay log10 x ndash403 028 ndash50 ndash10 Eq (6)

Attention mapActivation span [8] r1 488 025 03 100 Eq (8)Decay log10 q ndash118 008 ndash30 ndash10 Eq (8)

Target selectionAdditive noise log10 g ndash404 007 ndash90 ndash30 Eq (10)

Table 1 Model parameters

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 9

functions indicates that the spatial correlations presentin the experimental data were approximated by thedynamical model (Figure 5d) however the twosurrogate datasets representing uncorrelated sequencesby construction produce qualitatively different spatialcorrelations

To investigate the influences of saccade-lengthdistributions on pair correlations we constructedanother statistical control model that had access to theimage-specific saccade-length distribution It is impor-tant to note that this model was not introduced as acompetitor to the dynamical model which is able topredict saccade-length distributions The statisticalcontrol model approximated the distribution of saccadelengths l and 2D densities of gaze positions x bysampling from the joint probability distribution p(x l)under the assumption of statistical independence ofsaccade lengths and gaze positions ie p(x l)frac14p(x)p(l) This model by construction approximates thedistribution of saccade lengths and 2D density of gazepositions (Figure 5c) The simulations indicate how-

ever that even the combination of inhomogeneousdensity of gaze positions and non-normal distributionof saccade lengths used by the statistical control modelcannot explain spatial correlations in the experimentaldata characterized by the pair correlations function(Figure 5d)

For the statistical analysis of model performance onnew images we carried out additional numericalsimulations We fitted model parameters to dataobtained for the first five images only (see above) andpredicted data for the remaining 10 images by the newsimulations to isolate parameter estimation from modelevaluation (calculating test errors rather than trainingerrors)

Our simulations show that the dynamical modelpredicted the 2D density of gaze positions accurately(Figure 6a) The obtained KLD values for the model(blue) were comparable to KLD values calculated bythe split-half procedure for the experimental data (red)and to the KLD values obtained for the statisticalcontrol models (green frac14Homogeneous point process

Figure 5 Distribution of saccade lengths and pair correlation functions from model simulations (image 6) (a) Experimental

distribution of gaze positions (red dots) and a representative sample trial (red lines) (b) Corresponding plot of simulated data

obtained from our dynamical model (blue dots) A single-trial simulation is highlighted (blue line) (c) Distributions of saccade lengths

for experimental data (red) dynamical model (blue) homogeneous point process (green) inhomogeneous point process (yellow) and

a statistical control model (magenta) (d) Pair correlation functions for the different models The dynamical model (blue line) produces

spatial correlations similar to the experimental data (red line)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 10

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 10: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

functions indicates that the spatial correlations presentin the experimental data were approximated by thedynamical model (Figure 5d) however the twosurrogate datasets representing uncorrelated sequencesby construction produce qualitatively different spatialcorrelations

To investigate the influences of saccade-lengthdistributions on pair correlations we constructedanother statistical control model that had access to theimage-specific saccade-length distribution It is impor-tant to note that this model was not introduced as acompetitor to the dynamical model which is able topredict saccade-length distributions The statisticalcontrol model approximated the distribution of saccadelengths l and 2D densities of gaze positions x bysampling from the joint probability distribution p(x l)under the assumption of statistical independence ofsaccade lengths and gaze positions ie p(x l)frac14p(x)p(l) This model by construction approximates thedistribution of saccade lengths and 2D density of gazepositions (Figure 5c) The simulations indicate how-

ever that even the combination of inhomogeneousdensity of gaze positions and non-normal distributionof saccade lengths used by the statistical control modelcannot explain spatial correlations in the experimentaldata characterized by the pair correlations function(Figure 5d)

For the statistical analysis of model performance onnew images we carried out additional numericalsimulations We fitted model parameters to dataobtained for the first five images only (see above) andpredicted data for the remaining 10 images by the newsimulations to isolate parameter estimation from modelevaluation (calculating test errors rather than trainingerrors)

Our simulations show that the dynamical modelpredicted the 2D density of gaze positions accurately(Figure 6a) The obtained KLD values for the model(blue) were comparable to KLD values calculated bythe split-half procedure for the experimental data (red)and to the KLD values obtained for the statisticalcontrol models (green frac14Homogeneous point process

Figure 5 Distribution of saccade lengths and pair correlation functions from model simulations (image 6) (a) Experimental

distribution of gaze positions (red dots) and a representative sample trial (red lines) (b) Corresponding plot of simulated data

obtained from our dynamical model (blue dots) A single-trial simulation is highlighted (blue line) (c) Distributions of saccade lengths

for experimental data (red) dynamical model (blue) homogeneous point process (green) inhomogeneous point process (yellow) and

a statistical control model (magenta) (d) Pair correlation functions for the different models The dynamical model (blue line) produces

spatial correlations similar to the experimental data (red line)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 10

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 11: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

yellow frac14 Inhomogeneous point process magentafrac14Control model) Model type had a significant effect onKLD v2(4)frac14 1094 p 001 Post hoc comparisonsindicated significant effects between all models (p 001) except for the comparison between experimentaldata and the dynamical model (p frac14 0298) and for thecomparison between dynamical model and inhomoge-neous point process (p frac14 0679) In an analysis of thePCF estimated from the same set of simulated data(Figure 6b) the dynamical model (blue) produceddeviations from an uncorrelated point process that arein good agreement with the experimental data (red)Model type had a significant effect on PCF deviationv2(4)frac14 916 p 001 Post hoc comparisons indicatedsignificant effects for all comparisons (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0990) and between homoge-neous and inhomogeneous point processes (pfrac14 0996)

To check the reliability of the results we performedall corresponding calculations for the images showingabstract natural scenes (image set 2) For thesesimulations we used the set of model parameters fittedto the first five images of image set 1 whichcorresponds to the hypothesis that scene content(object-based scenes vs abstract natural patterns) doesnot have a strong impact on spatial correlations in thescanpath data For the simulated densities (Figure 6c)we reproduced the statistical results from image set 1Again model type had a significant effect on KLDv2(4)frac14 1626 p 001 Post hoc comparisons indicatedsignificant effects between all models (p 001) exceptfor the comparison between experimental data and thedynamical model (pfrac14 0251) for the comparisonbetween dynamical model and inhomogeneous pointprocess (p frac14 0784) and for the comparison betweeninhomogeneous point process and experimental data (pfrac14 0012) For the pair correlation function (Figure 6d)model type had a significant effect v2(4)frac14 1406 p 001 and post hoc comparisons indicated significanteffects for all comparisons (p 001) except for thecomparison between experiment and dynamical model(pfrac14 0453) and between homogeneous and inhomo-geneous point processes (p frac14 0700) Thus the mainstatistical results obtained from image set 1 werereproduced for images of abstract natural patterns(image set 2) These results lend support to thehypothesis that scene content does not have a stronginfluence on second-order spatial statistics of gazepatterns

Thus the dynamical model performed better thanany of the statistical models in predicting the averagepair correlations Although one of our statisticalcontrol models generated data by using image-specificsaccade-length information in addition to the 2Ddensity of gaze position it could not predict the spatialcorrelations as accurately as the dynamical model that

was uninformed about the image-specific saccade-length distribution

Discussion

Current theoretical models of visual attentionallocation in natural scenes are limited to the predictionof first-order spatial statistics (2D densities) of gazepatterns We were interested in attentional dynamicsthat can be characterized by spatial interactions (asfound in the second-order statistics) Using the theoryof spatial point processes we discovered that gazepatterns can be characterized by clustering at smalllength scales which cannot be explained by spatialinhomogeneity of the 2D density We proposed andanalyzed a model based on dynamical activation mapsfor attentional selection and inhibitory control of gazepositions The model reproduced 2D densities of gazemaps (first-order statistics) and distributions of saccadelengths as well as pair correlations (second-orderspatial statistics)

Spatial statistics

While research on the computation of visual saliencyhas been a highly active field of research (Borji amp Itti2013) there is currently a lack of computational modelsfor the generation of scanpaths on the basis of knownsaliency Inhibition-of-return (Klein 2000) has beenproposed as a key principle to prevent continuingrefixation within regions of highest saliency Howeverour analysis of the pair correlation function demon-strates that saccadic selection at small length scales isdominated by spatial clustering Thus our findings arehighly compatible with the view that inhibition-of-return cannot easily be observed in eye-movementbehavior in natural scene viewing experiments (Smith ampHenderson 2011) However the spatial correlationscan be exploited to investigate dynamical rules under-lying attentional processing in the visual system Ourexperimental data show a clear effect of spatialclustering for length scales shorter than about 48However these results are incompatible with thecurrent theory of saliency-based attention allocationcombined with an inhibition-of-return mechanismFuture simulations will investigate the predictive powerof the model when saliency models are used as input

Modeling spatial correlations

In a biologically plausible computational model ofsaccade generation a limited perceptual span needs to

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 11

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 12: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

be implemented for attentional selection (Findlay ampGilchrist 2012) We addressed this problem byassuming a Gaussian read-out mechanism with localretrieval from the saliency map through a limitedaperture analogous to the limited extent of high-fidelity information uptake through the fovea We usedthe experimentally observed density of fixations as aproxy for visual saliency

First our results indicate that a very limitedattentional span (Gaussian with standard deviationparameter 498) of about twice the size of theactivation mechanism for tracking the gaze positions(228) is sufficient for saccade planning This atten-

tional span is efficient however since the combinationof fixation and attention maps in our model activelydrives the modelrsquos gaze position to new salient regionscomputed via normalization of activations (Carandiniamp Heeger 2011)

Second our model correctly predicted spatialclustering of gaze positions at small-length scales Thepair correlation function indicates that there is apronounced contribution by refixations very close tothe current gaze position This effect is compatible withthe distribution of saccade lengths however a statis-tical control model that generated data from statisti-cally independent probabilities of 2D density and

Figure 6 Model predictions on images not used for parameter estimation Predicted data were generated from the dynamical model

for the 10 images of image set 1 not used for parameter estimation (a b) and for all 15 images from image set 2 (c d) (a) Modell

simulations of the KLD for experiments on image set 1 (10 images not used for model parameter estimation) and dynamical model

and three different statistical models For the experimental data a split-half procedure was applied to compute KLD (b)

Corresponding PCF deviations for the same model-generated and experimental data on image set 1 (c) KLD measures for image set 2

(d) PCF deviation for image set 2

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 12

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 13: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

saccade lengths could not reproduce the pair correla-tions adequately

Implications for computational models of activevision

During active vision our visual system relies onfrequent gaze shifts to optimize retinal input Using asecond-order statistical analysis we demonstrated thatspatial correlations across scanpaths might provideimportant constraints for computational models of eyeguidance In our dynamical model spatial clustering atsmall scales is the result of two principles First thefixation map is driven by an activation function with asmall spatial extent (258) Second the time-scale ofactivation build-up in the fixation map is slowcompared to the build-up of activation in the attentionmap Both mechanisms permit refixations at positionsvery close to the current gaze position before the systemmoves on to new regions of visual space

Limitations of the current approach

The current work focused on spatial statistics of gazepatterns and we proposed and analyzed dynamicalmechanisms of eye guidance in scene viewing In ourmodel a Gaussian read-out mechanism for the staticempirical saliency map was implemented as a simpli-fication A more biologically plausible combination ofour model of eye guidance with a dynamical saliencymodel (Borji amp Itti 2013) is a natural extension of thecurrent framework and the development of such amodel is work in progress in our laboratories Clearlythe current modeling architecture is not limited to inputfrom static saliency maps

Another simplification is related to the timing ofsaccades (Nuthmann Smith Engbert amp Henderson2010) In the current version of our model weimplemented random timing and sampled fixationdurations randomly from a predefined distributionMore adequate models of fixation durations howeverwill need to include interactions of processing difficultybetween fovea and periphery (Laubrock Cajar ampEngbert 2013)

Supplemental information

This work includes a supplemental video animationof the model simulations Experimental data onfixation patterns and computer code for statisticalanalysis and model simulations will be made availablevia the Potsdam Mind Research Respository (PMR2httpreadpsychuni-potsdamde)

Keywords scene perception eye movements atten-tion saccades modeling spatial statistics

Acknowledgments

This work was supported by Bundesministerium furBildung und Forschung (BMBF) through the BernsteinComputational Neuroscience Programs Berlin (ProjectB3 FKZ 01GQ1001F and FKZ 01GQ1001B to R Eand F A W resp) and Tubingen (FKZ 01GQ1002 toFAW) and by Deutsche Forschungsgemeinschaft(grants EN 47113ndash1 and WI 21034ndash1 to R E and FA W resp)

Commercial relationships noneCorresponding author Ralf EngbertEmail ralfengbertuni-potsdamdeAddress Cognitive Science Program amp Department ofPsychology University of Potsdam Potsdam Ger-many

References

Baddeley A amp Turner R (2005) spatstat An Rpackage for analyzing spatial point patternsJournal of Statistical Software 12(6) 1ndash42 Avail-able from httpwwwjstatsoftorg

Barthelme S Trukenbrod H Engbert R ampWichmann F (2013) Modeling fixation locationsusing spatial point processes Journal of Vision13(12) 1 1ndash34 httpwwwjournalofvisionorgcontent13121 doi10116713121 [PubMed][Article]

Beck C amp Schlogl F (1993) Thermodynamics ofchaotic systems Cambridge UK Cambridge Uni-versity Press

Borji A amp Itti L (2013) State-of-the-art in visualattention modeling IEEE Transactions on PatternAnalysis and Machine Intelligence 35(1) 185ndash207

Bylinskii Z Judd T Durand F Oliva A ampTorralba A (2012) MIT Saliency BenchmarkAvailable at httpsaliencymitedu

Carandini M amp Heeger D J (2011) Normalizationas a canonical neural computation Nature ReviewsNeuroscience 13 51ndash62

Engbert R (2012) Computational modeling of col-licular integration of perceptual responses andattention in microsaccades The Journal of Neuro-science 32(23) 8035ndash8039

Engbert R amp Kliegl R (2003) Microsaccades

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 13

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 14: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

uncover the orientation of covert attention VisionResearch 43 1035ndash1045

Engbert R amp Mergenthaler K (2006) Microsaccadesare triggered by low retinal image slip Proceedingsof the National Academy of Sciences USA 1037192ndash7197

Engbert R Mergenthaler K Sinn P amp PikovskyA (2011) An integrated model of fixational eyemovements and microsaccades Proceedings of theNational Academy of Sciences USA 108 E765ndashE770

Findlay J M amp Gilchrist I D (2012) Visualattentionmdasha fresh look Psychologist 25(12) 900ndash902

Findlay J M amp Walker R (1999) A model ofsaccade generation based on parallel processingand competitive inhibition Behavioral and BrainSciences 22(4) 661ndash674

Freund H amp Grassberger P (1992) The red queenrsquoswalk Physica A 190 218ndash237

Illian J Penttinen A Stoyan H and Stoyan D(2008) Statistical analysis and modelling of spatialpoint patterns New York Oxford University Press

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2 1ndash11

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Jones L A amp Higgins G C (1947) Photographicgranularity and graininess III Some characteristicsof the visual system of importance in the evaluationof graininess and granularity Journal of the OpticalSociety of America 37 217ndash263

Judd T Durand F amp Torralba A (2012) Abenchmark of computational models of saliency topredict human fixations In MIT Tech Report noMIT-CSAIL-TR-2012-001 Available at httpdspacemiteduhandle1721168590

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5) 7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Killian N J Jutras M J amp Buffalo E A (2012) Amap of visual space in the primate entorhinalcortex Nature 491 761ndash764

Klein R M (2000) Inhibition of return Trends inCognitive Science 4 138ndash147

Laubrock J Cajar A amp Engbert R (2013) Control

of fixation duration during scene viewing byinteraction of foveal and peripheral processingJournal of Vision 13(12) 11 1ndash20 httpwwwjournalofvisionorgcontent131211 doi101167131211 [PubMed] [Article]

Law R Illian J Burslem D F R P Gratzer GGunatilleke C V S amp Gunatilleke I A U N(2009) Ecological information from spatial pat-terns of plants Insights from point process theoryJournal of Ecology 97 616ndash626

Le Meur O Le Callet P Barba D amp Thoreau D(2006) State-of-the-art in visual attention model-ing IEEE Transactions on Pattern Analysis andMachine Intelligence 28(5) 802ndash817

Levi D M (2008) Crowdingmdashan essential bottleneckfor object recognition A mini-review VisionResearch 48 635ndash654

Luce R D (1959) Individual choice behavior Atheoretical analysis New York Wiley

Ludwig C J Farrell S Ellis L A amp Gilchrist I D(2009) The mechanism underlying inhibition ofsaccadic return Cognitive Psychology 59(2) 180ndash202

Mitchell M (1998) An introduction to genetic algo-rithms Cambridge MA MIT Press

Nuthmann A Smith T J Engbert R amp Hender-son J M (2010) State-of-the-art in visual atten-tion modeling Psychological Review 117 382ndash405

R Core Team (2013) R A language and environmentfor statistical computing [Computer softwaremanual] Vienna Austria Retrieved from httpwwwR-projectorg

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network Computa-tion in Neural Systems 10(4) 341ndash350

Rosenholtz R Huang J amp Ehinger K A (2012)Rethinking the role of top-down attention in visioneffects attributable to a lossy representation inperipheral vision Frontiers in Psychology 3(13) 1ndash14

Scott D W (1992) Multivariate density estimationTheory practice and visualization New YorkWiley

Smith T J amp Henderson J M (2011) Looking backat Waldo Oculomotor inhibition of return does notprevent return fixations Journal of Vision 11(1)httpwwwjournalofvisionorgcontent1113doi1011671113 [PubMed] [Article]

Stensola H Stensola T Solstad T Frland KMoser M-B amp Moser E I (2012) Theentorhinal grid map is discretized Nature 492 72ndash78

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 14

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 15: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

Tatler B W Baddeley R J amp Vincent B T (2006)The long and the short of it Spatial statistics atfixation vary with saccade amplitude and taskVision Research 46(12) 1857ndash1862

Tatler B W amp Vincent B T (2009) The prominenceof behavioural biases in eye guidance VisualCognition 17(6ndash7) 1029ndash1054

Torralba A Oliva A Castelhano M S ampHenderson J M (2006) Contextual guidance ofeye movements and attention in real-world scenesThe role of global features in object searchPsychological Review 113 766ndash786

Trukenbrod H A amp Engbert R (2014) ICAT Acomputational model for the adaptive control offixation durations Psychonomic Bulletin amp Review21(4) 907ndash934

Tsotsos J K Culhane S M Wai W Y K Lai YDavis N amp Nuflo F (1995) Modeling visual-attention via selective tuning Artificial Intelligence78 507ndash545

Wilming N Harst S Schmidt N amp Konig P(2013) Saccadic momentum and facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9(1)e1002871

Appendix

Estimation of model parameters

Some of the free parameters of the model were setto fixed values to reduce the number of freeparameters and to facilitate parameter estimationFirst saccade timing was outside the primary scope ofthe current work Time intervals between two deci-sions for saccadic eye movements are drawn from agamma distribution of eighth order (Trukenbrod ampEngbert 2014) with a mean value of l frac14 275 msSecond we assumed that the build-up of activation isconsiderably faster in the attention map than in thefixation map by choosing R0 frac14 001 and R1 frac14 1 ieR1R0 100

Model parameters were estimated by minimizationof a loss function combining information on thedensities of gaze positions and of saccade lengths

Kethr0 r1x q gTHORN frac14X

i

ethpei psi THORN2 thorn

X

j

ethqej qsj THORN2

ethA11THORNwhere pe and ps are the experimental and simulateddistributions of pair distances between all data points

for a given image and qe and qs are the distributions ofsaccade lengths for experimental and simulated datarespectively The minimum of the objective function Kwas determined by a genetic algorithm approach(Mitchell 1998) within a predefined range (Table 1)Mean values and standard errors of the means werecomputed from five independent runs of the geneticalgorithm

Qualitative analysis of the model

The pair correlation function was the mostimportant statistical concept in model evaluation Inour model the strength of spatial correlation turnedout to be related to the value of the exponent c in thefixation map of the potential Equation (9) Weperformed numerical simulations with the value ofparameter c fixed at different values between 0 and 1to investigate the dependence of the spatial correla-tions on this parameter qualitatively (Figure A1)While cfrac14 1 produces negatively correlated scanpathsg(r) 1 at short pair distances r it is possible toproduce even stronger PCF value than in theexperimental data for c 03 Thus a singleparameter in our model can generate a broad range ofsecond-order statistics

Some notes on the pair correlation function

The pair correlation function can be used to examinethe second-order statistics of a point pattern We first

Figure A1 Pair correlation function obtained from simulations

for different values of c (blue lines) in comparison to the

experimentally observed PCF (red line) and the result for the

inhomogeneous point process (yellow line) All simulations

were carried out for image 1 of our data set

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 15

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 16: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

need to define a few terms A point process is aprobability distribution that generates random pointpatterns a sample from a point process is a set ofobserved locations (ie fixations in our case) There-fore taking two different samples from the same pointprocess will result in two different sets of locationsalthough the locations may be similar (Figure A2)

First-order statistics the intensity function The first-order statistics of a point process are given by itsintensity function k(x) The higher the value of k(x) themore likely we are to find points around location xFigure A2c shows the theoretical intensity function forthe point process generating the points in Figures A2aand A2b

One way to look at the first-order statistics of a pointprocess is via random variables that count how manypoints fall in a given region For example we coulddefine a variable cA that counts how many points fallwithin area A for a given realization of the pointprocess The expectation of cA (how many points fall inA on average) is given by the intensity function ie

EethcATHORN frac14Z

AkethxTHORNdx ethA12THORN

where the integral is computed over area A A slightlydifferent viewpoint is given by the density functionwhich is a normalized version of the intensity functiondefined as

kethxTHORN frac14 kethxTHORNZ

Xkethx0THORNdx0

ethA13THORN

where the integral in the denominator is over theobservation window X which in our case corresponds tothe monitor (we cannot observe points outside of theobservation window) The density function integrates

to 1 over the observation window and represents aprobability density If we now define a random variablezA that is equal to one when a (small) area A containsone point and 0 otherwise we obtain

pethzA frac14 1THORN frac14Z

AkethxTHORNdx frac14 kethxATHORNdA ethA14THORN

where xA is the center of area A and dA its area If A issmall then k(x) will be approximately constant over Aand the integral simplifies to k(xA) times the volume

Second-order properties The first-order propertiesinform us about how many points can be expected tofind in an area or in the normalized version whetherwe can expect to find a point at all Second-orderproperties tell us about interaction between areaswhether for example we are more likely to find a pointin area A if there is a point in area B

In the case of the point process (Figure A2) thepoints are generated independently and do not interactin any way so that knowing the location of one pointtells us nothing about where the other ones will be Asshown in the manuscript this is not so with fixationlocations which tend to cluster at certain distances

The second-order statistics of a point process capturesuch trends and one way to describe the second-orderstatistics is to use the pair correlation function Thepair correlation function is derived from the pairdensity function q(xA xB) which gives the probabilityof finding points at both location xA and location xBLet us consider two random variables zA and zB whichare equal to 1 if there are points in their respectiveareas A and B and 0 otherwise (again we assume thatthe areas are small) The probability that zA frac14 1 andthat zBfrac14 1 individually is given by the density functionEquation (A13) The probability that both are equal toone is given by the pair density function

Figure A2 First-order properties of point processes (a b) Two samples from a point process (c) The intensity of the point processk(x) which corresponds to the expected number of points to be found in a small circle around location x Dark regions indicate highintensity (density)

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 16

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18
Page 17: Spatial statistics and attentional dynamics in scene viewing...Spatial statistics Gaze positions can be interpreted as realizations from a spatial point process (Illian et al.,2008)

pethzA frac14 zB frac14 1THORN frac14Z

A

Z

Bqethxx0THORNdxdx0

frac14 qethxA xBTHORNdAdB ethA15THORN

The pair density function already answers ourquestion of whether observing a point in A makes itmore likely to see one in B and vice versa If points arecompletely independent then the resulting pair densityis given by

pethzA frac14 zB frac14 1THORN frac14 kethxATHORNkethxBTHORNdAdB ethA16THORNIf the pair density function gives us a different result

then an interaction is occurring Therefore if we takethe ratio of the pair density Equation (A15) to theproduct of the densities we obtain a measurement ofdeviation from statistical independence ie

cethxx0THORN frac14 qethxx0THORNkethxTHORNkethx0THORN

ethA17THORN

The resulting object is however a complicated four-dimensional (ie two dimensions for x and twodimensions for x0) function and in practice it ispreferable to use a summary measure which is the paircorrelation function expressing how often pairs ofpoints are found at a distance of e from each other Thepair correlation function is explained informally inFigure A3

More formally the pair correlation function is justan average of c(x x0) for all pairs x x0 that areseparated by a distance r ie

qethrTHORN frac14Z

x

Z

x 0Xjdethxx 0THORNfrac14rcethxx0THORNdxdx0 ethA18THORN

In the above equation the notation x0Xjdethxx 0THORN frac14 r

indicates that we are integrating over the set of allpoints x0 that are on a circle of radius r around x (butstill in the observation window X)

If we are to estimate q(r) from data we need anestimate of the intensity function as it appears as acorrection in Equation (A17) In addition since we

have only observed a discrete number of points theestimated pair density function can only be estimatedby smoothing which is why a kernel function needs tobe used We refer readers to (Illian et al 2008) fordetails on pair correlation functions

Figure A3 From the pair density function to the pair correlation

function The pair density function q(xA xB) describes theprobability of finding points at both xA and xB in a sample from

the point process As such it is a four-dimensional function and

hard to estimate and visualize The pair correlation function

(PCF) is a useful summary To compute the raw pcf we pick an

initial location x0 (circle) and look at the probability of finding a

point both at x0 and in locations at a distance from x0 (first

array of circles around x0) We do this for various distances

(other arrays of circles) to compute the probability of finding

pairs as a function of Finally we average over all possible

locations x0 to obtain the pair correlation function The pair

correlation function therefore expresses how likely we are to

find two points at a distance from each other

Journal of Vision (2015) 15(1)14 1ndash17 Engbert Trukenbrod Barthelme amp Wichmann 17

  • Introduction
  • Methods
  • e01
  • e02
  • e03
  • e04
  • e05
  • Results
  • f01
  • f02
  • e06
  • e07
  • e08
  • f03
  • e09
  • e10
  • f04
  • t01
  • f05
  • Discussion
  • f06
  • Baddeley1
  • Barthelme1
  • Beck1
  • Borji1
  • Bylinskii1
  • Carandini1
  • Engbert1
  • Engbert2
  • Engbert3
  • Engbert4
  • Findlay1
  • Findlay2
  • Freund1
  • Illian1
  • Itti1
  • Itti2
  • Jones1
  • Judd1
  • Kienzle1
  • Killian1
  • Klein1
  • Laubrock1
  • Law1
  • LeMeur1
  • Levi1
  • Luce1
  • Ludwig1
  • Mitchell1
  • Nuthmann1
  • RCoreTeam1
  • Reinagel1
  • Rosenholtz1
  • Scott1
  • Smith1
  • Stensola1
  • Tatler1
  • Tatler2
  • Torralba1
  • Trukenbrod1
  • Tsotsos1
  • Wilming1
  • Appendix
  • e11
  • e12
  • e13
  • e14
  • e15
  • e16
  • e17
  • e18