Bottom-up and top-down processing in visual … and top-down processing in visual perception Thomas...

Bottom-up and top-down processing in visual perception

Thomas Serre

Brown University

Department of Cognitive & Linguistic SciencesBrown Institute for Brain Science Center for Vision Research

The ‘what’ problem monkey electrophysiology

Willmore Prenger & Gallant ’10

shape factors (if any) are uniquely and consistently associatedwith neural responses. This has not been attempted before becauseof the intractable size of three-dimensional shape space. In this

virtually infinite domain, a conventional random or systematic(grid-based) stimulus approach can never produce sufficiently densecombinatorial sampling.

–1–1

Response = 0.4A + 0.0B + 49.0AB + 0.0

Relative x position

Relative z positionMaximum curvature

180 360

Angle on xy plane (deg)

Rotation angle (deg)900–90–4 –2 0 2 4

Depth (deg)

1 2.10.48Size(x)x position (deg)

–10 100

Shadin

e f g h

Stereo Nonstereo

900–900

Light angle (deg)Sha

900–90

–4 –2 0 2 4

1 2.10.48

900–90

–10 100

0 spikes s–1

s s–1

)0 spikes s–1

s s–1

1–1 0 1–1

–1 0 1–1

Figure 2 Neural tuning for three-dimensional configuration of surface fragments. (a) Top 50 stimuli across eight generations (400 stimuli) for a single ITneuron recorded from the ventral bank of the superior temporal sulcus (17.5 mm anterior to the interaural line). (b) Bottom 50 stimuli for the same cell.(c) Responses to highly effective (top), moderately effective (middle) and ineffective (bottom) example stimuli as a function of depth cues (shading, disparityand texture gradients, exemplified in Supplementary Fig. 10). Responses remained strong as long as disparity (black, green and blue) or shading (gray) cueswere present. The cell did not respond to stimuli with only texture cues (pale green) or silhouettes with no depth cues (pale blue). (d) Response consistencyacross lighting direction. The implicit direction of a point source at infinity was varied across 1801 in the horizontal (left to right, black curve) and vertical(below to above, green curve) directions, creating very different two-dimensional shading patterns (Supplementary Fig. 11). (e) Response consistency acrossstereoscopic depth. In the main experiment, the depth of each stimulus was adjusted so that the disparity of the surface point at fixation was 0 (that is, theanimal was fixating in depth on the object surface). In this test, the disparity of this surface point was varied from !4.51 (near) to 5.61 (far). (f) Responseconsistency across xy position. Position was varied in increments of 4.51 of visual angle across a range of 13.51 in both directions. (g) Sensitivity to stimulusorientation. As with all neurons in our sample, this cell was highly sensitive to stimulus orientation, although it showed broad tolerance (about 901) to rotationabout the z axis (rotation in the image plane, blue curve); this rotation tolerance is also apparent among the top 50 stimuli in a. Rotation out of the imageplane, about the x axis (black) or y axis (green), strongly suppressed responses. (h) Response consistency across object size over a range from half to twice thatof the original stimulus. (i) Linear/nonlinear response model based on two Gaussian tuning functions (details as in Fig. 2d,e). (j) The tuning functions areprojected onto the surface of a high-response stimulus, seen from the observer’s viewpoint (left) and from above (right). Error bars indicate s.e.m. in all panels.

1354 VOLUME 11 [ NUMBER 11 [ NOVEMBER 2008 NATURE NEUROSCIENCE

ART ICLES

Yamane Carlson Bowman Wang & Connor ’08

Hung Kreiman et al ’05

shape factors (if any) are uniquely and consistently associatedwith neural responses. This has not been attempted before becauseof the intractable size of three-dimensional shape space. In this

virtually infinite domain, a conventional random or systematic(grid-based) stimulus approach can never produce sufficiently densecombinatorial sampling.

–1–1

Response = 0.4A + 0.0B + 49.0AB + 0.0

Relative x position

Relative z positionMaximum curvature

180 360

Angle on xy plane (deg)

Rotation angle (deg)900–90–4 –2 0 2 4

Depth (deg)

1 2.10.48Size(x)x position (deg)

–10 100

Shadin

e f g h

Stereo Nonstereo

900–900

Light angle (deg)Sha

900–90

–4 –2 0 2 4

1 2.10.48

900–90

–10 100

0 spikes s–1

s s–1

)0 spikes s–1

s s–1

1–1 0 1–1

–1 0 1–1

Figure 2 Neural tuning for three-dimensional configuration of surface fragments. (a) Top 50 stimuli across eight generations (400 stimuli) for a single ITneuron recorded from the ventral bank of the superior temporal sulcus (17.5 mm anterior to the interaural line). (b) Bottom 50 stimuli for the same cell.(c) Responses to highly effective (top), moderately effective (middle) and ineffective (bottom) example stimuli as a function of depth cues (shading, disparityand texture gradients, exemplified in Supplementary Fig. 10). Responses remained strong as long as disparity (black, green and blue) or shading (gray) cueswere present. The cell did not respond to stimuli with only texture cues (pale green) or silhouettes with no depth cues (pale blue). (d) Response consistencyacross lighting direction. The implicit direction of a point source at infinity was varied across 1801 in the horizontal (left to right, black curve) and vertical(below to above, green curve) directions, creating very different two-dimensional shading patterns (Supplementary Fig. 11). (e) Response consistency acrossstereoscopic depth. In the main experiment, the depth of each stimulus was adjusted so that the disparity of the surface point at fixation was 0 (that is, theanimal was fixating in depth on the object surface). In this test, the disparity of this surface point was varied from !4.51 (near) to 5.61 (far). (f) Responseconsistency across xy position. Position was varied in increments of 4.51 of visual angle across a range of 13.51 in both directions. (g) Sensitivity to stimulusorientation. As with all neurons in our sample, this cell was highly sensitive to stimulus orientation, although it showed broad tolerance (about 901) to rotationabout the z axis (rotation in the image plane, blue curve); this rotation tolerance is also apparent among the top 50 stimuli in a. Rotation out of the imageplane, about the x axis (black) or y axis (green), strongly suppressed responses. (h) Response consistency across object size over a range from half to twice thatof the original stimulus. (i) Linear/nonlinear response model based on two Gaussian tuning functions (details as in Fig. 2d,e). (j) The tuning functions areprojected onto the surface of a high-response stimulus, seen from the observer’s viewpoint (left) and from above (right). Error bars indicate s.e.m. in all panels.

1354 VOLUME 11 [ NUMBER 11 [ NOVEMBER 2008 NATURE NEUROSCIENCE

ART ICLES

Yamane Carlson Bowman Wang & Connor ’08

The ‘what’ problem human psychophysics

Thorpe et al ’96; VanRullen & Thorpe ’01; Fei-Fei et al ’02 ’05; Evans & Treisman ’05; Serre Oliva & Poggio ’07

Vision as ‘knowing what is where’

Aristotle; Marr ’82

Ungerleider & Mishkin ‘84

‘What’ and ‘where’ cortical pathways

ventral ‘what‘ stream

dorsal ‘where’ stream

?How does the visual system combine information about the identity and location of objects?

➡ Central thesis: visual attention (see also Van Der Velde and De Kamps ‘01; Deco and Rolls ‘04)

• Ying Zhang

• Ethan Meyers

• Narcisse Bichot

• Tomaso Poggio

• Bob Desimone

Part I: A ‘Bayesian’ theory of attention that solves the ‘what is where’ problem

• Predictions for neurophysiology and human eye movement data

Part II: Readout of IT population activity

• Attention eliminates clutter

• Sharat Chikkerur

• Cheston Tan

• Tomaso Poggio

• Ying Zhang

• Ethan Meyers

• Narcisse Bichot

• Tomaso Poggio

• Bob Desimone

• Cheston Tan

• Tomaso Poggio

Perception as Bayesian inference

Mumford ’92; Knill & Richards ‘96; Dayan & Zemel ’99; Rao ’02 ’04; Kersten & Yuille ‘03;

Kersten et al ‘04; Lee & Mumford ‘03; Dean ’05; George & Hawkins ’05 ’09; Hinton ‘07; Epshtein

et al ‘08; Murray & Kreutz-Delgado ’07

visual scene description

image measurements

P (S|I) ∝ P (I|S)P (S)

image measurements

P (S|I) ∝ P (I|S)P (S)

image measurements

P (S|I) ∝ P (I|S)P (S)

image measurements

S = {O1, O2, · · · , On, L1, L2, · · · , Ln}

object identities

object locations

Iimage

measurements

S = {O1, O2, · · · , On, L1, L2, · · · , Ln}

object identities

object locations

P (S, I) = P (O1, L1, O2, L2, · · · , Ln, Ln, I)

Assumption #1: Attentional spotlight

Broadbent ’52 ‘54; Treisman ‘60; Treisman & Gelade ‘80; Duncan & Desimone ‘95; Wolfe ‘97;

and many others

Iimage

measurements

• To recognize and localize objects in the scene, the visual system selects objects, one object at a time P (O,L, I)

Assumption #2: ‘what’ and ‘where’ pathways

Iimage

measurements

• Object location L and identity O are independent

P (O,L, I)

P (O,L, I) = P (O)P (L)P (I|L,O)

object

image measurementslocation

Assumption #2: ‘what’ and ‘where’ pathways

• Object location L and identity O are independent

Assumption #3: universal features

• Objects encoded by collections of ‘universal’ features (of intermediate complexity)

- Either present or absent

- Conditionally independent given the object and its location

object

image measurementslocation

see Riesenhuber & Poggio ’99; Serre et al ’05 ’07

object

retinotopic features

position and scale tolerant

features

location

image measurements

• Objects encoded by collections of ‘universal’ features (of intermediate complexity)

- Either present or absent

- Conditionally independent given the object and its location

object

features

location

image measurements

object

features

location

image measurements

Serre et al ’05 David et al ’06

Cadieu et al ’07Willmore et al ’10

object

features

location

image measurements

object

features

location

image measurements

Serre et al ’07IT

Bayesian inference and attention

object

features

location

image measurements

Serre et al ’07

Predicting IT readout

3.4ocenter

Size:Position:

3.4ocenter

1.7ocenter

6.8ocenter

3.4o2o horz.

3.4o4o horz.

IT Model

Model: Serre et al ‘05 Experimental data: Hung* Kreiman*et al ‘05

object

features

location

image measurements

Serre Oliva & Poggio ’07

behavior

object

features

location

image measurements

behavior

object

features

location

image measurements

behavior

feedforward (bottom-up)

object

features

location

image measurements

behavior

object

features

location

image measurements

• Goal of visual perception: to estimate posterior probabilities of visual features, objects and their locations in an image

• Attention corresponds to conditioning on high-level latent variables representing particular features or locations (as well as on sensory input), and doing inference over the other latent variables

ventral / what

dorsal / where

ventral / what

dorsal / where

P (Xi|I) =P (I|Xi)

�F i,L P (Xi|F i, L)P (L)P (F i)

�P (I|Xi)

ventral / what

dorsal / where

feedforward input

P (Xi|I) =P (I|Xi)

�P (I|Xi)

ventral / what

dorsal / where

top-down spatial attention

top-down feature-based attention

attentional modulation

feedforward input

P (Xi|I) =P (I|Xi)

�P (I|Xi)

ventral / what

dorsal / where

top-down spatial attention

top-down feature-based attention

attentional modulation

feedforward input

P (Xi|I) =P (I|Xi)

�P (I|Xi)

suppressive drive

P (Xi|I) =P (I|Xi)

�P (I|Xi)

Neuron

Review

The Normalization Model of Attention

John H. Reynolds1,* and David J. Heeger21Salk Institute for Biological Studies, La Jolla, CA 92037-1099, USA2Department of Psychology and Center for Neural Science, New York University, New York, NY 10003, USA*Correspondence: reynolds@salk.eduDOI 10.1016/j.neuron.2009.01.002

Attention has been found to have a wide variety of effects on the responses of neurons in visual cortex. Wedescribe amodel of attention that exhibits each of these different forms of attentional modulation, dependingon the stimulus conditions and the spread (or selectivity) of the attention field in the model. The model helpsreconcile proposals that have been taken to represent alternative theories of attention. We argue that thevariety and complexity of the results reported in the literature emerge from the variety of empirical protocolsthat were used, such that the results observed in any one experiment depended on the stimulus conditionsand the subject’s attentional strategy, a notion that we define precisely in terms of the attention field in themodel, but that has not typically been completely under experimental control.

IntroductionAttention has been known to play a central role in perceptionsince the dawn of experimental psychology (James, 1890).Over the past 30 years, the neurophysiological basis of visualattention has become an active area of research, yielding anexplosion of findings. Neuroscientists have utilized a variety oftechniques (single-unit electrophysiology, electrical microstimu-lation, functional imaging, and visual-evoked potentials) to mapthe network of brain areas thatmediate the allocation of attention(Corbetta and Shulman, 2002; Yantis and Serences, 2003) and toexamine how attention modulates neuronal activity in visualcortex (Desimone and Duncan, 1995; Kastner and Ungerleider,2000; Reynolds and Chelazzi, 2004). During the same period oftime, the field of visual psychophysics has developed rigorousmethods for measuring and characterizing the effects of atten-tion on visual performance (Braun, 1998; Carrasco, 2006; Cava-nagh and Alvarez, 2005; Sperling andMelchner, 1978; Verghese,2001; Lu and Dosher, 2008).

We review the single-unit electrophysiology literature docu-menting the effects of attention on the responses of neurons invisual cortex, and we propose a computational model to unifythe seemingly disparate variety of such effects. Some resultsare consistent with the appealingly simple proposal that atten-tion increases neuronal responses multiplicatively by applyinga fixed response gain factor (McAdams and Maunsell, 1999;Treue and Martinez-Trujillo, 1999), while others are more inkeeping with a change in contrast gain (Li and Basso, 2008; Mar-tinez-Trujillo and Treue, 2002; Reynolds et al., 2000), or witheffects that are intermediate between response gain andcontrast gain changes (Williford and Maunsell, 2006). Otherstudies have shown attention-dependent sharpening of neuronaltuning at the level of the individual neuron (Spitzer et al., 1988) orthe neural population (Martinez-Trujillo and Treue, 2004). Stillothers have shown reductions in firing rate when attention wasdirected to a nonpreferred stimulus that was paired witha preferred stimulus also inside the receptive field (Moran andDesimone, 1985; Recanzone and Wurtz, 2000; Reynolds et al.,1999; Reynolds and Desimone, 2003). These different effectsof attentional modulation have not previously been explained

within the framework of a single computational model. Wedemonstrate here that a model of attention that incorporatesdivisive normalization (Heeger, 1992b) exhibits each of thesedifferent forms of attentional modulation, depending on the stim-ulus conditions and the spread (or selectivity) of the attentionalfeedback in the model.In addition to unifying a range of experimental data within

a common computational framework, the proposedmodel helpsreconcile alternative theories of attention. Moran and Desimone(1985) proposed that attention operates by shrinking neuronalreceptive fields around the attended stimulus. Desimone andDuncan (1995) proposed an alternative model, in which neuronsrepresenting different stimulus components compete and atten-tion operates by biasing the competition in favor of neurons thatencode the attended stimulus. It was later suggested that atten-tion instead operates simply by scaling neuronal responses bya fixed gain factor (McAdams and Maunsell, 1999; Treue andMartinez-Trujillo, 1999). Treue and colleagues advanced the‘‘feature-similarity gain principle,’’ that the gain factor dependson the match between a neuron’s stimulus selectivity and thefeatures or locations being attended (Treue andMartinez-Trujillo,1999; Martinez-Trujillo and Treue, 2004). Spitzer et al., 1988proposed that attention sharpens neuronal tuning curves, andMartinez-Trujillo and Treue (2004) explained that sharpening ispredicted by their ‘‘feature-similarity gain principle.’’ Finally, Rey-nolds et al., 2000 proposed that attention increases contrastgain. Indeed, the initial motivation for the model proposed herederived from the reported similarities between the effects ofattention and contrast elevation on neuronal responses (Rey-nolds and Chelazzi, 2004; Reynolds et al., 1999, 2000; Reynoldsand Desimone, 2003).The proposed normalization model of attention combines

aspects of each of these proposals and exhibits all of theseforms of attentional modulation. Thus, the various models out-lined above are not mutually exclusive. Rather, they can all beexpressed by a single, unifying computational principle. Wepropose that this computational principle endows the brainwith the capacity to increase sensitivity to faint stimuli presentedalone and to reduce the impact of task irrelevant distracters

168 Neuron 61, January 29, 2009 ª2009 Elsevier Inc.

R(x, θ) =A(x, θ)E(x, θ)

S(x, θ) + σ

Multiplicative scaling of tuning curves by spatial attention

Bottom-up and top-down processing in visual … and top-down processing in visual perception Thomas...

Documents

Brown, tan, beige - Prince Edward Island arts site/unit1/brown.pdf · Brown, tan, beige Meaning • Brown is a natural, down-to-earth neutral color. • It is found in earth, wood,

AUDITORY AND VISUAL SUSTAINED ATTENTION IN DOWN …

Gravity Influences Top-Down Signals in Visual Processing · Gravity Influences Top-Down Signals in Visual Processing Guy Cheron1,2*, Axelle Leroy1, Ernesto Palmero-Soler1, Caty De

Top-down facilitation of visual recognition

459 DEBS FIRST DOWN CASH 459 - heritageplace.com · Hip No Con signed by Mary M Brown West Hip No. 459 DEBS FIRST DOWN CASH 459 Brown Colt Feb ru ary 6, 2000 First Down Dash SI 105

The Power of Vulnerability by Brene Brown (A Visual Summary)

Upside-Down Brilliance: The Visual-Spatial Learner · Upside-Down Brilliance: The Visual-Spatial Learner Linda Kreger Silverman, Ph.D. Licensed Psychologist PEGY London, England November

top-down y bottom-up en la percepción visual

BROWN UNIVERSITY · height must scale proportionally to width. brown university visual identity policy and strategy 10 do not stretch do not crop ... dolar set amet brown

Visual Representations of Executing Programs - Brown Computer

Database projects in visual studio 2010 Anthony Brown anthony@found-it.net

Top-down facilitation of visual object recognition: object ...cvcl.mit.edu/sunseminar/pbr_06_bar.pdf · Top-down facilitation of visual object recognition: object-based and context-based

Drag Me Down The Visual Script.pdf

Throwing Down the Visual Intelligence Gauntletcbcl.mit.edu/publications/ps/MIT-CSAIL-TR-2012-016.pdf · Throwing Down the Visual Intelligence Gauntlet Cheston Tan, Joel Z Leibo, and

Top-down facilitation of visual recognition · visual cortex and, specifically, that this OFC activation represents the cortical source of top-down facilitation in visual object recog-nition

Visual Social Thinking Techniques: How to Leverage the Visual Thinker’s Strengths Presented by: Stephanie Brown Admissions Coordinator, College Internship

Top-down influences on the time-course of visual attention:

Top-down visual search in Wimmelbild

Eye Tracking the Visual Search of Click-Down Menus

Look and Think Twice: Capturing Top-Down Visual Attention ...vision.ics.uci.edu/papers/CaoLYYWWHXRH_ICCV_2015/CaoLYYWWHXRH_ICCV... · Look and Think Twice: Capturing Top-Down Visual