56
What and where: A Bayesian inference theory of attention Sharat Chikkerur, Thomas Serre, Cheston Tan & Tomaso Poggio CBCL, McGovern Institute for Brain Research, MIT

What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

What and where:A Bayesian inference theory of

attentionSharat Chikkerur, Thomas Serre, Cheston Tan & Tomaso Poggio

CBCL, McGovern Institute for Brain Research, MIT

Page 2: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Outline Preliminaries

Perception & Bayesian inference Background & motivation Theory

Attention as inference Bayesian model

Computational model Model properties

Applications on real-world images Predicting human eye movements Improving object recognition

Page 3: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Mumford and Lee, “Hierarchical Bayesian Inference in the

Visual Cortex”, JOSA, 20(7), 2003

Recurrent feed-forward/feedback loops integrate bottom up

information with top down priors

Bottom-up signals : Data dependent

Top-down signals : Task dependent

Top down signals provide context information and help to

disambiguate bottom-up signals

xIT

xV4

xV2

xV1

x0

Perception as Bayesianinference

Page 4: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Hegde J. and Felleman J., Reappraising the functional implications of the primate visual anatomical hierarchy, Neuroscientist, 13(5), 2007

Bottom up vs. top-down

Page 5: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Bottom up vs. top-down

Page 6: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Mathematical framework

Page 7: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Systemf() ‏

X(data) ‏

Y(state) ‏

Y

X

Bayesian generative models

Statistical learning view:

Y = f(X) , X-data, Y-class

Generative model view:

X-random, Y-random

X~P(X|Y) ,

X(data) ‏

Y

Y

X

( ) ( ) ( )YPYXPXYP || !

Page 8: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

xIT

xV

x0

( ) ( ) ( ) ( ) ( )( )

( )( ) ( )!

BBA,P=APAP

BPBAP=ABP

APABP=BPBAP=BA,P)|()|(

|| Recall,

For the given network,

( ) ( ) ( ) ( )ITITVV0VIT xPxxPxxP=x,x,xP ||0

( ) ( ) ( )( ) ( )ITVV0

ITVITV0IT0V

xxPxxP=xxPx,xxP=xx,xP

|||||

Inference:

( ) ( ) ( )( )

( ) ( )ITVV

IT

ITVITVITV

xxPxxPxxP

xxPx,xxP=x,xxP

|||

|||

0

0

00

Bottom-up Top-down

Perception: bottom-up & top-down

Page 9: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

e+

X

Y

e-

λy(x)=P(e-|x)‏

π(x)=P(x|e+) ‏

π(y)=P(y|e+) ‏

λ(y)=P(e-|y) ‏

“Evidence”

“Prior”

b(x)α λ(x) π(x)‏

π(e+) ‏

λ(e-) ‏

∑Yλ(y)P(y|x) ‏

∑e+π(e+)P(x|e+) ‏

Belief propagation

Page 10: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

George D. and Hawkins J., A hierarchical Bayesian model of invariant pattern recognition in the visual cortex, IJCNN, 2005

Biological plausibility

Page 11: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

U

X

Y1 Y2 Y3

( ) ( ) ( ) ( )xëxëxë=xë yy1y 32

( ) ( ) ( )!x

uxPxë=uë |( ) ( ) ( )!u

uDuxP=xD |

( ) ( ) ( ) ( ) ( )xyPxëxëxDc=yD yyx

|1211 !

Trees

Page 12: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

U1

X

Y1 Y2 Y3

( ) ( ) ( ) ( )! xëxëxë=xë yyy 321

( ) ( ) ( ) ( )! !x

,u

uDuuxPxë=uë 121

1

2 |( ) ( ) ( ) ( )!21

2121|u

uDuDuuxP=xDu

( ) ( ) ( ) ( ) ( )xyPxëxëxDc=yD yyx

|1211 !

U2

Polytrees

Page 13: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

AttentionBackground & motivation

Page 14: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Visual processing: what andwhere

Ventral (‘what’) stream: Processes shape information Responsible for object

recognition Progressive loss of location

information Dorsal (‘where’) stream:

Processes location and motioninformation

Progressive loss of forminformation•Form and location is processed concurrently and (almost) independently of each other

•How does the brain combine form and location information?

Dorsalstream

Ventralstream

Page 15: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Ventral stream: invariantrecognition

Figures from Serre et al, Hung et al.

Zoccolan Kouh Poggio DiCarlo 2007

IT

Reynolds, Chelazzi & Desimone ‘99

V4

Serre Oliva Poggio 2007

Psychophysics

•How does the brain recognize objects under clutter?

Page 16: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Attention is needed to recognize objects under clutter

Parallel vs. serial processing

Page 17: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

•Filter theory (Broadbent)•Biased competition (Desimone)•Feature integration theory (Treisman)•Guided search (Wolfe)•Scanpath theory (Noton)

•Bayesian surprise (Itti)•Bottleneck (Tsotsos)

Computational Role

Effects•Contrast gain•Response gain•Modulation under spatial attention•Modulation under feature attention

Biology

•Pop-out•Serial vs. Parallel•Bottom-up vs. Top-down

Attention

• V1• V4• MT• LIP• FEF

Everybody knows what attention is…-William James, 1907

Page 18: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Bridging the gap Conceptual models (theories)

Provide justifications not implementations Computational models

Model behavior (eye-movements) Cannot model physiological effects

Phenomenological models Model specific physiological effects Cannot provide theory

Bridging the gap Phenomenological, predicts behavior, theory

Page 19: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

A theoretical framework

Page 20: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Bayesian model

Kersten & Yuille ‘04

Assumption: visual system selects and localizes objects, one ata time

Page 21: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Bayesian model

Assumption: object location and identity are marginallyindependent of each other

Page 22: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Bayesian model

Assumption: Every object is generated using a set of Ncomplex features each of which may be present or absent

Page 23: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Bayesian model

Fi: Location/scale invariant features

Page 24: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Computational model

Page 25: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Feature-based attention: Where is object O?Feature-based attention: Where is object O?Spatial attention: What is at location L?Spatial attention: What is at location L?

LIP/FEFLIP/FEF

V2V2

V4V4

ITIT

PFCPFC

Relation to biology

Page 26: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Model description

Fi

xi

L

N

“What”“Where”

Feature-maps

Image

Feature-maps

Page 27: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Model properties: invariance

Fi

xi

L

N

“What”“Where”

Page 28: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Model properties: crowding

Fi

xi

L

N

“What”“Where”

Page 29: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Model: spatial attention

Fi

Fil

L

N

What is at location X?

X

* * *

Page 30: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Model: feature-based attention

Fi

Fil

L

N

Where is object X?

X

* * *

Page 31: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Model properties

Page 32: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Spatial Invariance

Page 33: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Spatial Attention

Page 34: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Feature Attention

Page 35: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Feature Popout

Page 36: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Parallel vs. Serial Search

Page 37: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Application I: Predicting eye-movements

Page 38: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Predicting eye movements Eye movements can be considered as a proxy for

attention Cues influencing eye-movements

Bottom-up image saliency Top-down feature biases Top-down spatial bias

Evaluation

Page 39: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Model can predict human eye-movements

Top-down spatial and feature attentionTop-down spatial and feature attention

Method ROC area (absolute)

Bruce and Tsotos ’06 72.8%

Itti et al ’01 72.7%

Proposed 77.9%

Method ROC area (Cars) ROC area (Pedestrian)

Itti et al. ’01 42.3% 42.3%

Torralba et al. 78.9% 77.1%

Proposed 80.4% 80.1%

Humans 87.8% 87.4%

Bottom-up attentionBottom-up attention

Page 40: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Application II: Improving recognition

Page 41: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Effect of clutter on detection

recognition without attention

recognition under attention

Page 42: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Recognition performance improves with attention

Chikkerur, Serre, Tan & Poggio (in submission, Vision Research)Chikkerur, Serre, Tan & Poggio (in submission, Vision Research)

Page 43: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Recognition performance improves with attention

Chikkerur, Serre, Tan & Poggio (in prep)Chikkerur, Serre, Tan & Poggio (in prep)

Page 44: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Summary Theory

Attention is part of the inference process thatsolves the problem of what is where.

Computational model We describe a computational model and

relate it to functional anatomy of attention. Attentional phenomena (pop-out,

multiplicative modulation, contrast response)are ‘predicted’ by the model.

Applications Predicting human eye movements. Improving object recognition

Page 45: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Thank you!

Page 46: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information
Page 47: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Relation to prior work

Page 48: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Feedback and its role Reconstruction

Mental Imagery

Murray J. F., Visual recognition, inference and coding using learned sparse overcomplete representations, PhD thesis, UCSD, 2005Hinton G. E, Osindero S. and Teh. Y, A fast learning algorithm for deep belief nets, Neural computation, vol. 18, 2006

Page 49: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Feedback and its role Visual attention/Segmentation

Murray J. F., Visual recognition, inference and coding using learned sparse overcomplete representations, PhD thesis, UCSD, 2005Fukushima K., A neural network model for selective attention in visual pattern recognition, Biological Cybernetics, vol. 55, 1986

Page 50: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

‘Predicting’ physiological effects

Page 51: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Spatial attention

0 20 40 60 80 100 120 140 160 180

UnattendedAttended

ModelMcAdams and Maunsell ‘99

Page 52: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Feature-based attention

ModelBichot and Desimone ‘05

Page 53: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Contrast gain vs. Responsegain

Trujillo and Treue ‘02 Mc Adams and Maunsell’99

Page 54: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

Attentional effects in MT: popout

Page 55: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

MT: Feature based attention

Page 56: What and where: A Bayesian inference theory of attention9.520/spring10/Classes/class17_attention_2010.pdf · Recurrent feed-forward/feedback loops integrate bottom up information

MT: Multi-modal interaction