What and where: A Bayesian inference theory of...

What and where:A Bayesian inference theory of

attentionSharat Chikkerur, Thomas Serre, Cheston Tan & Tomaso Poggio

CBCL, McGovern Institute for Brain Research, MIT

Outline Preliminaries

Perception & Bayesian inference Background & motivation Theory

Attention as inference Bayesian model

Computational model Model properties

Applications on real-world images Predicting human eye movements Improving object recognition

Mumford and Lee, “Hierarchical Bayesian Inference in the

Visual Cortex”, JOSA, 20(7), 2003

Recurrent feed-forward/feedback loops integrate bottom up

information with top down priors

Bottom-up signals : Data dependent

Top-down signals : Task dependent

Top down signals provide context information and help to

disambiguate bottom-up signals

Perception as Bayesianinference

Hegde J. and Felleman J., Reappraising the functional implications of the primate visual anatomical hierarchy, Neuroscientist, 13(5), 2007

Bottom up vs. top-down

Mathematical framework

Systemf() ‏

X(data) ‏

Y(state) ‏

Bayesian generative models

Statistical learning view:

Y = f(X) , X-data, Y-class

Generative model view:

X-random, Y-random

X~P(X|Y) ,

X(data) ‏

( ) ( ) ( )YPYXPXYP || !

( ) ( ) ( ) ( ) ( )( )

( )( ) ( )!

BBA,P=APAP

BPBAP=ABP

APABP=BPBAP=BA,P)|()|(

|| Recall,

For the given network,

( ) ( ) ( ) ( )ITITVV0VIT xPxxPxxP=x,x,xP ||0

( ) ( ) ( )( ) ( )ITVV0

ITVITV0IT0V

xxPxxP=xxPx,xxP=xx,xP

Inference:

( ) ( ) ( )( )

( ) ( )ITVV

ITVITVITV

xxPxxPxxP

xxPx,xxP=x,xxP

Bottom-up Top-down

Perception: bottom-up & top-down

λy(x)=P(e-|x)‏

π(x)=P(x|e+) ‏

π(y)=P(y|e+) ‏

λ(y)=P(e-|y) ‏

“Evidence”

“Prior”

b(x)α λ(x) π(x)‏

π(e+) ‏

λ(e-) ‏

∑Yλ(y)P(y|x) ‏

∑e+π(e+)P(x|e+) ‏

Belief propagation

George D. and Hawkins J., A hierarchical Bayesian model of invariant pattern recognition in the visual cortex, IJCNN, 2005

Biological plausibility

Y1 Y2 Y3

( ) ( ) ( ) ( )xëxëxë=xë yy1y 32

( ) ( ) ( )!x

uxPxë=uë |( ) ( ) ( )!u

uDuxP=xD |

( ) ( ) ( ) ( ) ( )xyPxëxëxDc=yD yyx

|1211 !

Y1 Y2 Y3

( ) ( ) ( ) ( )! xëxëxë=xë yyy 321

( ) ( ) ( ) ( )! !x

uDuuxPxë=uë 121

2 |( ) ( ) ( ) ( )!21

2121|u

uDuDuuxP=xDu

( ) ( ) ( ) ( ) ( )xyPxëxëxDc=yD yyx

|1211 !

Polytrees

AttentionBackground & motivation

Visual processing: what andwhere

Ventral (‘what’) stream: Processes shape information Responsible for object

recognition Progressive loss of location

information Dorsal (‘where’) stream:

Processes location and motioninformation

Progressive loss of forminformation•Form and location is processed concurrently and (almost) independently of each other

•How does the brain combine form and location information?

Dorsalstream

Ventralstream

Ventral stream: invariantrecognition

Figures from Serre et al, Hung et al.

Zoccolan Kouh Poggio DiCarlo 2007

Reynolds, Chelazzi & Desimone ‘99

Serre Oliva Poggio 2007

Psychophysics

•How does the brain recognize objects under clutter?

Attention is needed to recognize objects under clutter

Parallel vs. serial processing

•Filter theory (Broadbent)•Biased competition (Desimone)•Feature integration theory (Treisman)•Guided search (Wolfe)•Scanpath theory (Noton)

•Bayesian surprise (Itti)•Bottleneck (Tsotsos)

Computational Role

Effects•Contrast gain•Response gain•Modulation under spatial attention•Modulation under feature attention

Biology

•Pop-out•Serial vs. Parallel•Bottom-up vs. Top-down

Attention

• V1• V4• MT• LIP• FEF

Everybody knows what attention is…-William James, 1907

Bridging the gap Conceptual models (theories)

Provide justifications not implementations Computational models

Model behavior (eye-movements) Cannot model physiological effects

Phenomenological models Model specific physiological effects Cannot provide theory

Bridging the gap Phenomenological, predicts behavior, theory

A theoretical framework

Bayesian model

Kersten & Yuille ‘04

Assumption: visual system selects and localizes objects, one ata time

Bayesian model

Assumption: object location and identity are marginallyindependent of each other

Bayesian model

Assumption: Every object is generated using a set of Ncomplex features each of which may be present or absent

Bayesian model

Fi: Location/scale invariant features

Computational model

Feature-based attention: Where is object O?Feature-based attention: Where is object O?Spatial attention: What is at location L?Spatial attention: What is at location L?

LIP/FEFLIP/FEF

PFCPFC

Relation to biology

Model description

“What”“Where”

Feature-maps

Model properties: invariance

Model properties: crowding

Model: spatial attention

What is at location X?

Model: feature-based attention

Where is object X?

Model properties

Spatial Invariance

Spatial Attention

Feature Attention

Feature Popout

Parallel vs. Serial Search

Application I: Predicting eye-movements

Predicting eye movements Eye movements can be considered as a proxy for

attention Cues influencing eye-movements

Bottom-up image saliency Top-down feature biases Top-down spatial bias

Evaluation

Model can predict human eye-movements

Top-down spatial and feature attentionTop-down spatial and feature attention

Method ROC area (absolute)

Bruce and Tsotos ’06 72.8%

Itti et al ’01 72.7%

Proposed 77.9%

Method ROC area (Cars) ROC area (Pedestrian)

Itti et al. ’01 42.3% 42.3%

Torralba et al. 78.9% 77.1%

Proposed 80.4% 80.1%

Humans 87.8% 87.4%

Bottom-up attentionBottom-up attention

Application II: Improving recognition

Effect of clutter on detection

recognition without attention

recognition under attention

Recognition performance improves with attention

Chikkerur, Serre, Tan & Poggio (in submission, Vision Research)Chikkerur, Serre, Tan & Poggio (in submission, Vision Research)

Recognition performance improves with attention

Chikkerur, Serre, Tan & Poggio (in prep)Chikkerur, Serre, Tan & Poggio (in prep)

Summary Theory

Attention is part of the inference process thatsolves the problem of what is where.

Computational model We describe a computational model and

relate it to functional anatomy of attention. Attentional phenomena (pop-out,

multiplicative modulation, contrast response)are ‘predicted’ by the model.

Applications Predicting human eye movements. Improving object recognition

Thank you!

Relation to prior work

Feedback and its role Reconstruction

Mental Imagery

Murray J. F., Visual recognition, inference and coding using learned sparse overcomplete representations, PhD thesis, UCSD, 2005Hinton G. E, Osindero S. and Teh. Y, A fast learning algorithm for deep belief nets, Neural computation, vol. 18, 2006

Feedback and its role Visual attention/Segmentation

Murray J. F., Visual recognition, inference and coding using learned sparse overcomplete representations, PhD thesis, UCSD, 2005Fukushima K., A neural network model for selective attention in visual pattern recognition, Biological Cybernetics, vol. 55, 1986

‘Predicting’ physiological effects

Spatial attention

0 20 40 60 80 100 120 140 160 180

UnattendedAttended

ModelMcAdams and Maunsell ‘99

Feature-based attention

ModelBichot and Desimone ‘05

Contrast gain vs. Responsegain

Trujillo and Treue ‘02 Mc Adams and Maunsell’99

Attentional effects in MT: popout

MT: Feature based attention

MT: Multi-modal interaction

What and where: A Bayesian inference theory of...

Documents

SIFF programme spring10

ELEC 677: Recurrent Neural Network Applications & Recurrent Neural Network … · 2016-11-08 · Recurrent Neural Network Applications & Recurrent Neural Network Language Models Lecture

ArchNews Spring10-final

research.sanfordhealth.org/rare-disease-registry Recurrent ... · Recurrent Respiratory Papillomatosis Foundation Recurrent Respiratory Papillomatosis Foundation rrpf.org “While

Telecom connections spring10

1 Flowchart notation and loops Implementation of loops in C –while loops –do-while loops –for loops Auxiliary Statements used inside the loops –break –continue

Loops Wrap Up 10/21/13. Topics Sentinel Loops Nested Loops *Random Numbers

Cool loops, transequatorial loops and coronal waves

1.6 Loops academy.zariba.com 1. Lecture Content 1.While loops 2.Do-While loops 3.For loops 4.Foreach loops 5.Loop operators – break, continue 6.Nested

Workshop 2013 - TU Berlin...recurrent excitation-inhibition loops and (iii) ampli es the network response to oscillatory external inputs for a narrow band of frequencies [3]. Our results

CSCI 141 - Western Washington Universitywehrwes/courses/csci141_19f/lectur… · CSCI 141 Lecture 10: Modules, random, loops loops loops loops, range

NEH Spring10

RCS Loops -MODES 5, Loops FilledRCS Loops -MODES 5, Loops Filled B 3.4.7 B 3.4 REACTOR COOLANT SYSTEM (RCS) B 3.4.7 RCS Loops -MODES 5, Loops Filled BASES BACKGROUND In MODE 5 with

Vector loops and Parallel Loops - CilkPlus · N3419: Vector Loops and Parallel Loops 7 3 Parallel Loops 3.1 Background This portion presents the proposal to add a language construct

Alessio Micheli Intro to Learning in SD -1€¢ Recurrent neural networks: A different category of architecture, based on the addition of feedback loops connections in the network

Conditional loops while do while Counting loops for loops Arrays Recursion

A Transformation of Iterative Loops into Recursive Loops - UPV

Ground loops Why grounding is so important ?DefinitionGround loops :(Introduction)where ground loops affect ?Ground loops are created by :Solutions

1 Chapter 6 – Repetition 6.1 Do Loops 6.2 For...Next Loops 6.3 List Boxes and Loops

Nesting Loops (That’s loops inside of loops. Why not?)