Upload
belinda-sparks
View
218
Download
0
Embed Size (px)
DESCRIPTION
Fundamental Constraints Acuity is limited. High acuity only in central retina. Attention is limited. Not all information in the image can be processed. Visual Working Memory is limited. Only a limited amount of information can be retained across gaze positions.
Citation preview
Adaptive Control of Gaze and Attention
Mary HayhoeUniversity of Texas at Austin
Jelena JovancevicUniversity of Rochester
Brian Sullivan
University of Texas at Austin
Selecting information from visual scenes
What controls the selection process?
Fundamental Constraints Acuity is limited.High acuity only in central retina.
Attention is limited. Not all information in the image can be processed. Visual Working Memory is limited. Only a limited amount of information can be retained across gaze positions.
target selection
signals to muscles
inhibits SC
saccade decision
saccade command
planning movements
Neural Circuitry for Saccades
Image properties eg contrast, edges, chromatic saliency can account for some fixations when viewing images of scenes (eg Itti & Koch, 2001; Parkhurst & Neibur, 2003; Mannan et al, 1997).
Saliency and Attentional Capture
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Saliency is computed from the image using feature maps (color, intensity,orientation) at different spatial scales, filtered with a center-surround mechanism, and then summed. Gaze goes to the peak.From Itti & Koch (2000).
Certain stimuli thought to capture attention or gaze in a bottom-up manner, by interrupting ongoing visual tasks. (eg sudden onsets, moving stimuli, etc Theeuwes et al, 2001 etc )
This is conceptually similar to the idea of salience.
Attentional Capture
Limitations of Saliency Models
Important information may not be salient eg an irregularity in the sidewalk.
Salient information may not be important - eg retinal image transients from eye/body movements.
Doesn’t account for many observed fixations, especially in natural behavior - previous lecture.(Direct comparisons: Rothkopf et al 2007, Stirk & Underwood, 2007)
Will this work in natural vision?
Foot placement
Obstacle avoidance
Heading
Viewing pictures of scenes is different from acting within scenes.
Need to Study Natural Behavior
Dynamic Environments
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
The Problem
Any selective perceptual system must choose what to select, and when to select it.
How is this done given that the natural world is unpredictable? (The “initial access” problem, Ullman, 1984)
Answer - it’s not all that unpredictable and we’re really good at learning it.
Is bottom up capture effective in natural environments?
Looming stimuli seem like good candidates for bottom-upattentional capture (Regan & Gray, 200; Franceroni & Simons,2003).
Human Gaze Distribution when Walking
• Experimental Question: How sensitive are subjects to unexpected salient events?
• General Design: Subjects walked along a
footpath in a virtual environment while avoiding pedestrians.
Do subjects detect
unexpected potential collisions?
Virtual Walking Environment
Virtual Research V8 Head Mounted Display with 3rd Tech HiBall Wide
Area motion tracker
V8 optics with ASL501 Video Based Eye Tracker (Left) and ASL 210
Limbus Tracker (Right)
D&c emily
Video Based Tracker
Limbus Tracker
Virtual Environment
Bird’s Eye view of the virtual walking environment.
Monument
• 1 - Normal Walking: “Avoid the pedestrians while walking at a normal pace and staying on the sidewalk.”
• 2 - Added Task: Identical to condition 1. Additional instruction:” Follow the yellow pedestrian.”
Normal walking
Follow leader
Experimental Protocol
Distribution of Fixations on Pedestrians Over Time
-Pedestrians fixated most when they first appear
-Fewer fixations on pedestrians in the leader trials
0
0.2
0.4
0.6
0.8
1
Time since the appearance onscreen (sec)
Prob
abili
ty o
f fix
atio
n
0-1 1-2 2-3 3-4 4-5
Normal Walking
Follow Leader
Pedestrians’ paths
Colliding pedestrian path
What Happens to Gaze in Response to an Unexpected Salient Event?
•The Unexpected Event: Pedestrians veered onto a collision course for 1 second (10% frequency). Change occurs during a saccade.
Does a potential collision evoke a fixation?
Fixation on Collider
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
No Fixation During Collider Period
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Probability of Fixation During Collision Period
Pedestrians’ paths
Colliding pedestrian path
More fixations on colliders in normal walking.
0
0.2
0.4
0.6
0.8
1
No Leader
Leader
Probability of fixation
Controls Colliders
Normal Walking
Small increase in probability of fixating the collider could be caused
either by a weak effect of attentional capture or by active, top-down search of the
peripheral visual field.
Why are colliders fixated?
Probability of Fixation During Collision Period
Pedestrians’ paths
Colliding pedestrian path
More fixations on colliders in normal walking.
No effect in Leader condition
0
0.2
0.4
0.6
0.8
1
No Leader
Leader
Probability of fixation
Controls Colliders
Normal Walking
Follow Leader
Small increase in probability of fixating the collider could be caused
either by a weak effect of attentional capture or by active, top-down search of the
peripheral visual field.
Failure of collider to attract attention with an added task (following) suggests that detections result from active search.
Why are colliders fixated?
Prior Fixation of Pedestrians Affects Probability of Collider Fixation
• Fixated pedestrians may be monitored in periphery, following the first fixation
• This may increase the probability of fixation of colliders
Conditional probabilitiesConditional probabilities
0
0.1
0.2
0.3
Not fixated FixatedChange in the dist. to the Leader (m) 0
0.1
0.2
0.3
No prior fixations With prior fixationsChange in the dist. to the Leader (m)
Other evidence for detection of colliders?
Do subjects slow down during collider period?
Subjects slow down, but only when they fixate collider. Implies fixation measures “detection”.
Slowing is greater if not previously fixated. Consistent with peripheral monitoring of previously fixated pedestrians.
Sum of Pedestrian Fixations Following a
Detection of a Collider
0
0.2
0.4
0.6
0.8
1
Not Fixated Fixated
Fixation durations (s)
No Leader
Leader
Detecting a Collider Changes Fixation Strategy
Longer fixation on pedestrians following a detection of a collider
“Miss” “Hit”
Time fixating normal pedestrians following detection of a collider
Normal Walking
Follow Leader
0
0.2
0.4
0.6
0.8
1
Constant Increased
Colliders Speed
Probability of fixation
Colliders
Controls
Colliders are fixated with equal probability whether or not they increase speed (25%) when they initiate the collision path.
No Leader
Effect of collider speed
0
0.2
0.4
0.6
0.8
1
1-2 2-3 3-4
Number of pedestrians
Probability of fixation
No Leader
Leader
0
0.2
0.4
0.6
0.8
1
0-5 5-10 10-15 15-20 20-25
Degrees of rotation
Probability of fixation
No Leader
Leader
0
0.2
0.4
0.6
0.8
1
3.5-4 4-4.5 4.5-5
Distance to the observer (m)
Probability of fixation
No Leader
Leader
0
0.2
0.4
0.6
0.8
1
Purple Red Green Pink
Pedestrian color
Probability of fixation
No Leader
Leader
No systematic effects of stimulus properties on fixation.
Summary
• Subjects fixate pedestrians more when they first appear in the field of view, perhaps to predict future path.
• A potential collision can evoke a fixation but the increase is modest.
• Potential collisions do not evoke fixations in the leader condition.
• Collider detection increases fixations on normal pedestrians.
To make a top-down system work, Subjects need to learn statistics of environmental events and distribute gaze/attention based on these expectations.
Subjects rely on active search to detect potentially hazardous events like collisions, rather than reacting to bottom-up, looming signals (attentional capture).
Possible reservation…
Perhaps looming robots not similar enough to real pedestrians to evoke a bottom-up response.
Walking -Real World
• Experimental question: Do subjects learn to deploy gaze
in response to the statistics of environmental events?
Experimental Setup
System components: Head mounted optics (76g), Color scene camera, Modified DVCR recorder, Eye Vision Software, PC Pentium 4, 2.8GHz processor
A subject wearing the ASL Mobile Eye
• Occasionally some pedestrians veered on a collision course with the subject (for approx. 1 sec)
• 3 types of pedestrians:
Trial 1: Rogue pedestrian - always collides Safe pedestrian - never collides Unpredictable pedestrian - collides 50% of time
Trial 2: Rogue Safe Safe Rogue Unpredictable - remains same
Experimental Design (ctd)
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Fixation on Collider
Effect of Collision Probability
Probability of fixating increased with higher collision probability.
(Probability is computed during period in the field of view, not just collision interval.)
Detecting Collisions: proactive or reactive?
• Probability of fixating risky pedestrian similar, whether or not he/she actually collides on that trial.
Almost all of the fixations on the Rogue were made before the collision path onset (92%).
Thus gaze, and attention are anticipatory.
Effect of Experience
Pedestrian fixations after conflicting experience (Trial 2)
0
0.2
0.4
0.6
0.8
1
Safe (previously Rogue) Rogue (previously Safe)
Probability of fixation
Safe and Rogue pedestrians interchange roles.
Pedestrian fixations with no prior experience (Trial 1)
0
0.2
0.4
0.6
0.8
1
Safe Rogue
Probability of fixation
Learning to Adjust Gaze
• Changes in fixation behavior fairly fast, happen over 4-5 encounters (Fixations on Rogue get longer, on Safe shorter)
N=5
Shorter Latencies for Rogue Fixations
• Rogues are fixated earlier after they appear in the field of view. This change is also rapid.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Effect of Behavioral Relevance
Fixations on all pedestrians go down when pedestrians STOP instead of COLLIDING.STOPPING and COLLIDING should have comparable salience.
Note the the Safe pedestrians behave identically in both conditions - only the Rogue changes behavior.
• Fixation probability increases with probability of a collision path.
• Fixation probability similar whether or not the pedestrian collides on that encounter.
• Fixations are anticipatory.• Changes in fixation behavior fairly rapid
(fixations on Rogue get longer, and earlier, and on Safe shorter, and later)
Summary
Neural Substrate for Learning Gaze Patterns
Dopaminergic neurons in basal ganglia signal expected reward.
Neurons at all levels of saccadic eye movement circuitry are sensitive to reward. (eg Hikosaka et al, 2000; 2007; Platt & Glimcher, 1999; Sugrue et al, 2004; Stuphorn et al, 2000 etc)
This provides the neural substrate for learning gaze patterns in natural behavior, and for modelling these processes using Reinforcement Learning. (eg Sprague, Ballard, Robinson, 2007)
target selection
signals to muscles
inhibits SC
saccade decision
saccade command
planning movements
Neural Circuitry for Saccades
Virtual Humanoid has a small library of simple visual behaviors:– Sidewalk Following– Picking Up Blocks– Avoiding Obstacles
Each behavior uses a limited, task-relevant selection of visual information from scene.
Walter the Virtual Humanoid
Sprague, Ballard, & Robinson TAP (2007)
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
R L Modeling of Gaze Control
Walter learns where/when to direct gaze using reinforcementlearning algorithm.
Walter’s sequence of fixations
obstacles
sidewalk
litter
Subjects must learn the statistical structure of theworld and allocate attention and gaze accordingly.
Control of gaze, and attention, is proactive, not reactive, and thus is model based.
Anticipatory use of gaze is probably necessary for much visually guided behavior, because of visuo-motor delays.
Subjects behave very similarly despite unconstrained environment and absence of instructions.
Need reinforcement learning models to account forcontrol of attention and gaze in natural world.
Conclusions
• Task-based models can do a good job by learning scene statistics (Real walking: Jovancevic & Hayhoe, 2007)
• Another solution: attention may be attracted to deviations from expectations based on memory representation of scene.
How do subjects perceive unexpected events?ctd
• Hollingworth & Henderson (2002) argue that elaborate representations of scenes are built up in long-term memory.
• To detect a change, subjects may compare the current image with the learnt representation.
• If so, such representations might serve as a basis for attracting attention to changed regions of scenes (eg Brockmole & Henderson, 2005).
Thus subjects should be more sensitive to changes in familiar environments than to unfamiliar ones because the memory representation is well-defined.
Overview of the Experiment
• Question: If subjects become familiar with an environment, are changes more likely to attract attention? (cf Brockmole & Henderson, 2005).
• Design: Subjects walked along a footpath in a virtual environment including both stable & changing objects while avoiding pedestrians.
Virtual Environment
Virtual Environment
MONUMENT
Experimental Setup
Video Based Tracker
V8 optics with ASL501 Video Based Eye Tracker (Left)
Virtual Research V8 Head Mounted Display with 3rd Tech HiBall Wide Area motion tracker
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Replaced
Disappearance New Object
Moved Object
Object Changes
Stable Objects
Procedure
• Two groups, 19 subjects/ group:– Inexperienced Group: One familiarization trial– Experienced Group: 19 familiarization laps before
the changes occurred
QuickTime™ and aCinepak decompressor
are needed to see this picture.
QuickTime™ and aCinepak decompressor
are needed to see this picture.
• Total gaze duration on changed objects were much longer after experience in the environment.
• Fixation durations on stable objects were almost the same for the two groups.
Stable Objects Changing Objects
0
5 0
1 0 0
1 5 0
2 0 0
2 5 0
3 0 0
3 5 0
4 0 0
4 5 0
5 0 0
5 5 0
6 0 0
6 5 0
7 0 0
msec
N o E x p e r i e n c e
E x p e r i e n c e d
Experienced
Inexperienced
Average gaze duration/object/lap
Effects of Different Changes
0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
1 0 0 0
1 1 0 0
msec
I n e x p e r i e n c e d
E x p e r i e n c e d
Stable
Replaced
Disappeared Moved New
Experienced
Inexperienced
0
1 0
2 0
3 0
4 0
5 0
6 0
G r o u n d E n v i r o n m e n t P e d e s t r i a n s C h a n g i n g O b j. S t a b le O b j .
0
1 0
2 0
3 0
4 0
5 0
6 0
G r o u n d E n v i r o n m e n t P e d e s t r i a n s C h a n g i n g O b j . S t a b l e O b j.
Distribution of gaze
Object fixations account for only a small percentage of gaze allocation.
Inexperienced Experienced
Change Blindness• Probability of being aware of the changes was
correlated with gaze duration on the changing objects (rho=0.59).
• Awareness of the changes was low, suggesting that
fixations are a more sensitive indicator. • Change blindness in the natural world may be fairly uncommon, because most scenes are familiar.
• Suggests we learn the structure of natural scenes over time, and that attention is attracted by deviations from the normal state.
• These results are consistent with Brockmole &
Henderson (2005) and generalize the result to immersive environments and long time scales.
• Consistent with Predictive Coding models of cortical function.
Predictive Coding: Input is matched to stored representation.
-+
U
U Te = I - Ur
LGN CortexrI
Rao & Ballard, 1999.
Top-down signal based on memory
Bottom-up input from retina
Difference signal reveals mis-match
Unmatched residual signal prompts a re-evaluation of image data and may thereby attract attention.
• A mechanism that attracts attention and gaze based on mis-match with a model is similar to the idea of Bayesian “Surprise” (Itti & Baldi, 2005).
• One question is where the prior comes from. Itti & Baldi calculate surprise with respect to image changes over a short time scale. Here we suggest surprise is measured with respect to a memory representation.
“Surprise”
Conclusion
• Familiarity with the visual environment increases the probability that gaze will be attracted to changes in the scene.
• A mechanism whereby attention is attracted by deviations from a learnt representation may serve as a useful adjunct to task-driven fixations when unexpected events occur in natural visual environments.
Thank You
Behaviors Compete for Gaze/ Attentional Resources
The probability of fixation is lower for both Safe and Rogue pedestrians in both the Leader conditions than in the baseline condition .
Note that all pedestrians are allocated fewer fixations, even the Safe ones.