Upload
ali-borji
View
1.265
Download
0
Tags:
Embed Size (px)
Citation preview
Outline
( P 1 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Interactive Learning of Task-driven Visual Attention Control
Ali Borji borji@{ipm.ir,iai.uni-bonn.de}
6 OCT 2009School of Cognitive Sciences, IPM, Tehran, IRAN
Advisors: Dr. Majid Nili AhmadabadiDr. Babak Nadjar Araabi
PhD Thesis Defence
Outline
( P 2 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Outline
• Statement and Scope
• Background
• Related Research
• Contributions
• Modification and Biasing Saliency Model
• Learning to Saccade
• Conclusions & Future Works
Outline
( P 3 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Statement and Scope
Top-down
Visual
AttentionPurposive
VisionWhat: What is attention?Bottom-up vs. top-down attentionPurposive vision vs. reconstructionist & active visionConcurrent learning of task-driven visual attention control and physical actions
Why: Very important in online tasks where reaction time is limited, e.g. roboticsMay give insights of top-down attentional mechanisms in the brain
How: Redundant info. does not necessarily increase performance/recognition.There is a processing bottleneck inherent in physical implementations
Reinforcement learning for learning attention and actionsBio-Inspired, Minimalist, Time effects, Closed loop, RL based, Versatile
Decision Making & Attention
Control
Motor Action
State
Vision
Critic
WorldWorld
Recognition
Segmentation
...
Reward
Learning Unit
Perceptual Action (Overt attention)
Agent
Outline
( P 4 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Statement and Scope
ENVIRONMENT
Cameras
AGENT
Image Classification
Image
Right Class
during Learning
Class
Vision for Action (Open Loop)
Action
Effectors
Right Action
Vision for Action (Closed Loop)
Action
Vision for Action (without supervision)
Qualitative feedback
Outline
( P 5 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Background
Neurosciences & Psychology
• Visual Attention
• Space-based
• Object-based
• Feature-based
• “
• Covert
• Overt
• “
• Bottom-up
• Top-down
Machine vision & Robotics
• Vision is the most informative and challenging sensor
• Huge amount of sensory Information, only a subset is relevant
• Robots should guarantee a short response time
• Applications
• Navigation
• Object recognition,
• HCI, …
Outline
( P 6 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Yarbus 1967
Eye movements as indicators of cognitiveprocesses :
• Trace 1: examine at will
• Trace 2: estimate wealth
• Trace 3: estimate ages
• Trace 4: guess previous
activity
• Trace 5: remember clothing
• Trace 6: remember position
• Trace 7: time since last visit
Saccadic Eye Movements
Outline
( P 7 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Saliency-based ModelL. Itti, C. Koch, and E. Niebur, 1998
)),(()),((4
3
4
2scBYNscRGNC
c
csc
)),((4
3
4
2scINI
c
csc
}º135,º90,º45,º0{
4
3
4
2)),,((
scONNOc
csc
3
)()()( ONCNINS
Math:
Outline
( P 8 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Working Example
Outline
( P 9 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
U-TREEMcCallum 1996
Incremental state space discritization based on minimizing perceptual aliasing
Outline
( P 10 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Biasing Saliency-based Model
Basic idea:
For detection of an object some scales and features are more important. If there are limited computational resources, some unnecessary scales and features could be bypassed. In its current form, saliency model does not allow implementation of computational bottleneck
Step 1: Revising the basic saliency model
Step 2: Biasing it for object detection
fine scale
coarse scalesurround inhibited map
Basic saliency model Our model
SURROUND INHIBITION OPERATION
Borji, Nili, Araabi, Machine vision & Applications, 2009
Outline
( P 11 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Revised Saliency based Model
Gau
ssia
n
pyra
mid
s
Featu
re
map
sC
on
sp
icu
ity m
ap
s
+
Feature decomposition
I C O
WTA
Surround inhibition operation
Offline learned
top-down weights (w)
Input Image
Final Saliency
map
FOA
Across scale addition
Within maps addition
Conspicuity maps addition
Scale
w
eig
hts
Dim
en
sio
n
weig
hts
C
han
nel
weig
hts
Output Image
. . .
. . .
. . .
Intensity (I)
Color (C) Orientation (O)
R/G Y/B 0o 90oI
IOR
Revised saliency model
b) With cost
a) Without cost
Offline learning of top-down biases
xy position of target object
image set
Outline
( P 12 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Convergence
8 16 24 32 400
20
40
60
80
100
120
140
Generation number
Fitn
ess
valu
e
Without-cost case
Bike
Crossing
PedestrianCoke
Triangle
8 16 24 32 400
500
1000
1500
2000
2500
3000
3500
Generation number
Fitn
ess
valu
e
With-cost case
Bike
Crossing
PedestrianCoke
Triangle
8 16 24 32 400
20
40
60
80
100
120
140
Generation number
Fitn
ess
valu
e
Without-cost case
Bike
Crossing
PedestrianCoke
Triangle
8 16 24 32 400
500
1000
1500
2000
2500
3000
3500
Generation number
Fitn
ess
valu
e
With-cost case
Bike
Crossing
PedestrianCoke
Triangle
Scanpathes for detection of objects in natural scenes, each column for one object
Convergence of CLPSO algorithm in minimization of fitness functions
Outline
( P 13 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Synthetic Search Arrays
α in objective function
Outline
( P 14 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Bike
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Crossing
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Pedestrian
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Coke
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Triangle
bes
t indiv
idual
Without cost
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Bike
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Crossing
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Pedestrian
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Coke
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Triangle
bes
t indiv
idual
With cost
Object Detection in Natural Scenes
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Bike
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Crossing
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Pedestrian
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Coke
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Triangle
bes
t indiv
idual
Without cost
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Bike
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Crossing
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Pedestrian
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Coke
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Triangle
bes
t indiv
idual
With cost
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-1
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-2
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-3
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-4
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-5
bes
t indiv
idual
Without cost
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-1
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-2
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-3
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-4
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-5
bes
t indiv
idual
With cost
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-1
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-2
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-3
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-4
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-5
bes
t indiv
idual
Without cost
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-1
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-2
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-3
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-4
bes
t indiv
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Synthetic-5
bes
t indiv
idual
With cost
Outline
( P 15 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
target
biased saliency model basic saliency model
train testdetection rate %
avg. hit numberdetection
rate %avg. hit number
fitness# test
imagesdetection
rate %avg. hit number
bike 92.3(2.1) 1.4(0.4) 22.3(9.2) 60 90.2(2) 1.6(0.1) 81.8(0.6) 2.2(0.2)
crossing 96.7(1.2) 1.5(0.7) 18.1(5.4) 35 93.8(0.9) 1.5(0.2) 78.2(1.4) 2.5(0.3)
pedestrian 98.2(1) 1.2(0.2) 14.4(6.1) 45 94.2(1.1) 1.3(0.7) 83.3(1) 1.7(0.1)
coke 95.2(1.4) 1.3(0.3) 13.2(8) 59 92.2(2) 1.5(0.5) 80.9(0.4) 1.9(0.5)
triangle 92.5(2.3) 1.7(0.9) 27.8(11.4) 32 91(1.6) 1.8(0.4) 76.5(0.8) 2.2(0.2)
targettrain test
detection rate %
avg. hit number
fitness# test
imagesdetection
rate %avg. hit number
computation cost
bike 87.8(1.2) 1.7(0.3) 421.9(25.6) 60 85.8(1.5) 1.8(0.2) 25.5(4.2)
crossing 84.2(1.9) 1.5(0.7) 302.1(47.1) 35 78.2(2.1) 2(0.3) 40.5(5.8)
pedestrian 91.6(1) 1.5(0.4) 531.9(54.3) 45 90.6(2) 1.7(0.4) 35.6(5.2)
coke 87.3(2.7) 1.8(0.6) 730(34.4) 59 87.1(1.1) 1.9(0.1) 32.2(7.3)
triangle 89.6(2.1) 1.4(0.2) 512.2(27.7) 32 85.6(1.2) 2.1(0.2) 32.7(6.1)
Object Detection
a)
With
ou
t co
stComparing detection rate and cost of our method with basic saliency model
Total cost of basic saliency model is 52, with an assumed cost vector
Training over 10 random images with each object and testing over the rest
Hit number is x if object is detected in x-th saccade
b)
With
co
st
Outline
( P 16 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Learning top-down Object-based attention
• There is a debate between object-based and spatial attention control
• There are some models for learning object-based attention. Most famous and recent by SUN et al.
• Problem with the previous models in this area: They have not used proper tools for modeling and taking into consideration task demands. By proper we mean that they are limited to static scenes in limited control situations
• Since scene understanding is much dependent to actions and goals, an interactive approach seems more suitable
• Our model is the first one which uses the RL for this purpose + idea of state-space discretization
Long-term Memory
Visual SensorBottom-up Visual Attention
Recognized object
Object Recognition, Scene Interpretation, etc.
FOA
RL System
CriticVi
sual
Stim
ulus
Motor ActionState
Atten
ded
obje
ct
Atten
tion
Tree
/ St
ate
Extr
acto
r
Reward
Quasi-Static Learning
Stat
e an
d Ac
tion TD Error
Dec
isio
n M
akin
g an
d L
earn
ing
Ear
ly V
isio
n H
ighe
r Vi
sion
Learned weights (biases)
TD Error
Outline
( P 17 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Bike
best in
div
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Crossing
best in
div
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Pedestrian
best in
div
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Coke
best in
div
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Triangle
best in
div
idual
Without cost
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Bike
best in
div
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Crossing
best in
div
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Pedestrian
best in
div
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Coke
best in
div
idual
Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5
2
4
6Triangle
best in
div
idual
With cost
Sample objects in natural scenes. Best individual derived after minimization was applied to a test set.
Learned weights after CLPSO convergence over first two traffic signs averaged over five runs. s0 to s5 are scales in the image pyramid.
Early Vision Layer
Outline
( P 18 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
• The object at the attended location is recognized by the hierarchical model of object recognition (HMAX).
• A binary SVM classifier, is trained with positive samples of a class and negative samples from other classes.
• Offline learned classifier in this way is later used for online object recognition.
OR rates over test sets:
91.28% (± 2.8), 93% (± 2.75) 87% (± 3.2) 83.4% (± 4.2) 83% (± 4.2%)
Higher Vision Layer
Outline
( P 19 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Task: simulated visual navigation.
• Map of the route, consisting of 11 positions. There are 44 states.• The agent captures 360 × 270 RGB color images. Natural scenes
containing a subset of the objects are presented to the agent (5 for each combination).
• The agent has three possible motor actions: forward(F), Turn Left (L) and Turn Right(R) and can attend to one of n objects each time (n=5).
Navigation map in the experiment. A subset of 5 objects
is are present in random locations of scenes. Best actions are shown besides each state. In
some states two actions are optimal.
Visual Navigation
Outline
( P 20 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
01
1 0
1 0
1 0
A
B
EF
F R
1,8, 12,13,18,24,28,31, 36, 40, 44 26, 35 19
1 0
1 0
B
DR
L C
2,5,9,10,21, 25, 32, 33, 37, 41
3,15,16,20,27,29,30,34,38,42
6,14,22
L R
4,7,11,17,23,39,43
Algorithm generated 7 states with average depth of 3. It means that instead of attending to five objects simultaneously, serial attention to 3 objects in average could solve the problem.
Learned attention tree for the map of Fig. 6 with pruning.
Forty four states were clustered into 7 leaves. 100% correct policy was achieved.
Visual Navigation
Outline
( P 21 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
• Interactive and concurrent learning of top-down and action-based attention control
• Learning attentions when forming visual representations
• A spatial attention model
• Considering task demands
• Basics of our method like RLVC is the G and
U-TREE algorithms
• Motivated by eye movements (saccades) for visual processing
Learning to SaccadeBorji, Nili, Araabi, ICRA, 2009 - Image & Vision Computing, Under review
Outline
( P 22 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Overall Model
Motor action
Visual stimulus
Reward distributor
World World
Feature extractor Codebook Database
RL PolicyState
Feature code
Attention loop
Ove
rt a
tten
tion
/E
ye m
ove
men
t
Attention tree
Reward (Quasi-static learning)
GISTExtractor
FOA
GIST
Agent
Outline
( P 23 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Relative: Saccade directions could be selected with respect to the current saccde when there is cost associated with moving sensory organs
Absolute: Or in random locations with respect to each other
Clustering SIFT features
SIFT features of
Q = (q1, q2, …, qm)
Sample images are extracted and then clustered into
n clusters (β1, β2, …, βn)
Using standard K-means clustering algorithm.
Then codebook of SIFT feature k is the SIFT class with minimum distance to this feature:
d = argminj |K- βj |
Sample objects with derived SIFT features
S-Tree Algorithm
Outline
( P 24 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
RL – Update
RL – Update
S-Tree Algorithm
p1 p2 … Δt Action
2 3 x .31 Left
4 1 y .12 Right
…
Memory under a node
actioncodebook
Direction or location of saccade
Best saccade
Direction or location
leaves,
sta
tes
in Q
-Table
inte
rnal nodes
Tree – Update
Tree – Update
Aliased state
No further aliasing ? Terminate
TD error (Bellman residual)
Q-Learning Update formula
..
SIFT features Pi’s
Aliasing ? Yes
Aliased state
Outline
( P 25 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Visual Navigation Task
• Consider a discrete maze with walls
• An agent moves in the maze (penalty of 0 or −1 by move).
• The agent must reach the exit as fast as possible (reward of +10).
• The sensors return a picture of an object that depends on the cell.
Outline
( P 26 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
All distinct 3 equal 7 equal 6 equal & 7 equal0
5
10
15
Perceptual states(Leaves)
All distinct 3 equal 7 equal 6 equal & 7 equal1
1.2
1.4
1.6
1.8
2
Avg. Tree Depth
Action Effects
Same objects were placed at different positions with same best actions
Outline
( P 27 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
10×10 Visual Gridworld
Outline
( P 28 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
1 2 3 4 5 6 7 8 9-0.4
-0.2
0
0.2
0.4
Tree-fixed Iteration
Average Reward
1 2 3 4 5 6 7 8 90
1
2
3
Tree-Update Iteration
Average Variance
1 2 3 4 5 6 7 8 90
50
100
150
Tree-Update Iteration
Tree Size
Perceptual States (Leaves)
1 2 3 4 5 6 7 8 90
1
2
3
4
Tree-Update Iteration
Average Tree Depth
1 2 3 4 5 6 7 8 90.7
0.75
0.8
0.85
0.9
0.95
1
Tree-fixed Iteration
Correct Policy Rate
Performances
Performance of S-Tree during resolving 10 by 10 visual Grid
Outline
( P 29 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
1 2 3 4 5 6 7 8 9 10
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Tree-fixed Iteration
Correct Policy Rate
Perturbation Analysis
a) b)
Change this image into a new one after learning
Change three different images at three different locations each time after convergence
Outline
( P 30 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
POMDP Cases
TD-error in this state
What if perceptual information is not enough?
Employ a short term memory to make the environment Markov
Pair of previous action, state
j-th memory item of i-th stateCodebooks at
q spatial locations
Short term memory of depth n
Elicited action in this state
Outline
( P 31 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Object Recognition
Can be applied for object recognition when reward is defined as: where diagonal elements are correct classification of an object in respective class - TD error reduces to only r
Outline
( P 32 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Invariance Analysis
0 1 2 3 4 5 6 7 8 9 10 11 12 13 140
50
100
150
200
250
Grid position
Sta
te N
um
ber
Assignment of views to states
0 degree 30 degree 60 degree
1 2 3 4 5 6 7 8 9 10 11 12 130
20
40
60
80
100
120
140
160
Grid position
Sta
te N
umbe
r
Assignment of views to states
0 degree 5 degree 10 degree 15 degree 20 degree
Different views of an object or scene were
observed in each grid position randomly
Outline
( P 33 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
State Reduction Index:
How much views are grouped1 2 3 4 5 6 7 8 9 10 11 12 13
0
100
200
300
Assignment of views to statesS
tate
Num
ber
0 degree 5 degree 10 degree 15 degree 20 degree
1 2 3 4 5 6 7 8 9 10 11 12 130
50
100
150
Sta
te N
umbe
r
0 degree 5 degree 10 degree 15 degree 20 degree
1 2 3 4 5 6 7 8 9 10 11 12 130
10
20
30
40
50
Sta
te N
umbe
r
Grid position
0 degree 5 degree 10 degree 15 degree 20 degree
Nearest SIFT to saccade center
SIFT with highest magnitude
RLVC
Over natural scenes
Invariance Analysis
Outline
( P 34 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Saliency for Scene Classification
Narrowest is the first salient region
Can we do successful scene classification by only looking at salient regions?
Are they stable regions?
Outline
( P 35 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Summary & Conclusions
• A bottom-up model allowing object detection and bottleneck implementation
• Top-down object-based attention over basic saliency model !
• Closed-loop learning of image-to-action mappings (S-Tree)
• Interactive, task-driven, Space based, Bioinspired
• Scene classification by feature extraction at salient regions !
Summary
• Top-down task-relevance attentional modulations could be derived interactively in RL framework
• Attention is best to be learned in concert with physical actions and representations
• An spectrum of image classificaion problems
• Nature of the b.u attention is low-level mechanisms, while t.d attention is more like a control or a decision making problem
Conclusions
Outline
( P 36 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Future Directions
• How can structure be encoded in a bottom-up attention model?!
• Top-down bias for generating stable salient regions
• Context & Spatial Constraints among objects as well as semantics
• How complex scene analysis could be performed taking advantage of attention? Like LABELME dataset
• Approaches based on U-TREE fit solutions to the problem at hand
Such approaches should be extended to allow solving other similar tasks (Some way bet. Classic and New AI)
• Remedy Generalization – Max generalization with minimum spatial processing!
• Implementation on simulated or real robot on tasks containing real-world natural scenes
Outline
( P 37 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Related Publications
• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi, Saliency Maps for Attentive and Action-based Scene Classification, In Preparation.
• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi, Interactive Learning Of Space-based Visual Attention Control And Physical Actions, To Be Submitted.
• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi , Interactive Learning Of Task-driven Object-based Visual Attention Control, Image And Vision Computing, Passed First Review
• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi , Cost-sensitive Learning Of Top-down Modulation For Attentional Control, Machine Vision And Applications, In Press.
• Mostafa Ajalooian, Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi, Hadi Moradi, Fast Hand Gesture Recognition based on Saliency Maps: An Application to Interactive Robotic Marionette Playing, IEEE ROMAN 2009, Japan, Sept. 2009.
• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi, Learning Sequential Visual Attention Control through State Space Descritization, IEEE ICRA 09, Kobe, Japan, May 2009.
• Ali Borji, Majid Nili Ahamadabadi, Babak Nadjar Araabi. Interactive Learning of Top-down Attention Control and Motor Actions, IEEE IROS 2008 workshop on From Motor to Interaction Learning in Robots.
• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi, Learning Object-based Attention Control, in NIPS 2008 workshop on Machine Learning Meets Human Learning.
Outline
( P 38 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Thanks for your
attention
Outline
( P 39 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Invariancy Analysis
Outline
( P 40 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Future Work
Generalization Problem Modeling Context and Spatial Relations among visual
features & objects Scaling up Solutions to Large-scale data like LabelME Is there a method which could detect all objects in a scene
with high accuracy? Implementation on a real or simulated robot Extending Purpoive Solutions for Embodied Agents
(Like a U-Tree which considers body form of a robot) Generalization over other tasks
How to generalize the method, so derived knowlege from a task (here learned attention tree), could be used for other tasks with similar perceptions?
…
Outline
( P 41 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Historical question : “How do animals learn?”
RL Answer : Through their interactions with the environment, that give rise to a positive or negative feedback (trial-and-error).
More precisely : By learning a percept-to-action mapping that maximizes, over time, an evaluation of its performances given by the environment.
Examples : A dog learns to sit down by receiving sugars from its
master. A robotic hand learns to grasp objects by receiving an
information about the quality of the grasp from the physical world.
Reinforcement Learning
Outline
( P 42 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Reinforcement Learning
Basic Principles : The agent knows nothing about its environment. It only knows about its percepts and actions. After each interaction, it receives a numerical feedback. It progressively improves its policy by trying new actions.
Advantageous: No need of a physical model of the environment (while it
can accelerate learning). Therefore: General Approach, Simple design.
Allows a dynamical adaptation when the environment changes
Outline
( P 43 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Reinforcement Learning
S : finite set of states A : finite set of actions T(s, a, s’) : probabilistic transition function r(s,a) : numerical reinforcement function
Markov Decision Process (MDP)
Markovian Probabilistic Dynamics
Inputs: A database of interactions
Output: An optimal control policy
Reinforcement Learning Process
Outline
( P 44 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Reinforcement Learning
We don’t want to maximize immediate rewards (the sequence of rt ), but the rewards over time.
Temporal Credit Assignment
Markovian Probabilistic Dynamics
where γ ε [0,1[ is the discount factor giving the current value of the future rewards (i.e., a reward percieved k units of time later is only worth γk what it would represent currently)
γ = 0 short-sighted agent : maximize immediate rewards
γ = 1 agent with a more and more faraway horizon
Outline
( P 45 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
5 10 15 20 25 30-1
0
1
2
3
4
Epizode
Rew
ard
Average Rew ard
Smoothed Average Rew ard (W=2)
0.5 1 1.5 2 2.5 3 3.5 4 4.5-0.5
0
0.5
1
1.5
2
Tree-fixed Iteration
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30
5
10
15
20
Tree-Update Iteration
Average Variance
Average Reward
RL Grid Reward
Traditional RL
Outline
( P 46 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Algorithm
Outline
( P 47 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Visual Representations
Appearance-based Vision : Is not geometry-based vision or model-based vision, Structured-based Vision,
Global Appearance Approaches Normalized Images Eigen-Patches Histograms (color, texture, …)
Local Appearance Approaches Harris Corner Detector Interest Point Detector Local Descriptor
Exploiting Visual Features
Outline
( P 48 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Related Works
Modeling
Visual Attention
Psychophysics
Neurophysiology
Filter ModelsConnectionist
Models
Bottom up Top down
Navalpakkam 2005- Mozer et al. 2005- Frintrop et al. 2005- Palleta et al. –
GAO et al. – Waltehr et al. Sun et al.- Triiesch et al, This work
Itti et al. 98,
Torrallba et.al 2006
×
Itti, Koch 2001
Minut et al. 2001
Sprague, N 2003
Cesar Bandera 1996
Reichle et al. 2006:
Modeling eye movements of an expert
reader
Maljkovic and Nakayama 1994
Della Libera & Chellazi, 2007
Outline
( P 49 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Feature Maps
• Orientation
• Color
• Curvature
• Line end
• Movement
Saliency Concept
“Shifts in selective visual attention: towards the underlying neural circuitry”, C. Koch, and S. Ullman, 1985
Central RepresentationAttentionSalienc
ySalienc
y
Feature Maps
• Orientation
• Color
• Curvature
• Line end
• Movement
Feature Maps
• Orientation
• Color
• Curvature
• Line end
• Movement
Feature Maps• Orientation
• Color
• Curvature
• Line end
• Movement
Outline
( P 50 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Eye Movements
Saccades:
Quick “jumps” that connect fixations Duration is typically between 30 and 120 ms Very fast (up to 700 degrees/second) Saccades are ballistic, i.e., the target of a saccade
cannot be changed during the movement. Vision is suppressed during saccades to allow
stable perception of surroundings. Saccades are used to move the fovea to the next
object/region of interest.
Outline
( P 51 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
a) The final classifier that tests the presence of the circled local-appearance features
b) The label of the perceptual class that is assigned to each empty cell
C) The computed optimal policy for this classification
Jodogne & piater JAIR 2007
Working Example
Outline
( P 52 / 37) Ali Borji – Oct. 2009
Statement & Scope
Background
Related Research
• Saliency Model
• U-TREE
Biasing Saliency Model
• Revised Model
• Object detection
Learning to saccade
• S-Tree
• POMDP Cases
• Object Recognition
• Invariance Analysis
Conclusions
Future Works
Cost Vector in Biasing
C = [3, 1, 4, 3, 3, 1, 4, 4, 4, 4, 6, 5, 4, 3, 2, 1]