Defence session

Outline

( P 1 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE

Biasing Saliency Model

• Revised Model

• Object detection

Learning to saccade

• S-Tree

• POMDP Cases

• Object Recognition

• Invariance Analysis

Conclusions

Future Works

Interactive Learning of Task-driven Visual Attention Control

Ali Borji borji@{ipm.ir,iai.uni-bonn.de}

6 OCT 2009School of Cognitive Sciences, IPM, Tehran, IRAN

Advisors: Dr. Majid Nili AhmadabadiDr. Babak Nadjar Araabi

PhD Thesis Defence

Outline

( P 2 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Outline

• Statement and Scope

• Background

• Related Research

• Contributions

• Modification and Biasing Saliency Model

• Learning to Saccade

• Conclusions & Future Works

Outline

( P 3 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Statement and Scope

Top-down

Visual

AttentionPurposive

VisionWhat: What is attention?Bottom-up vs. top-down attentionPurposive vision vs. reconstructionist & active visionConcurrent learning of task-driven visual attention control and physical actions

Why: Very important in online tasks where reaction time is limited, e.g. roboticsMay give insights of top-down attentional mechanisms in the brain

How: Redundant info. does not necessarily increase performance/recognition.There is a processing bottleneck inherent in physical implementations

Reinforcement learning for learning attention and actionsBio-Inspired, Minimalist, Time effects, Closed loop, RL based, Versatile

Decision Making & Attention

Control

Motor Action

State

Vision

Critic

WorldWorld

Recognition

Segmentation

...

Reward

Learning Unit

Perceptual Action (Overt attention)

Agent

Outline

( P 4 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Statement and Scope

ENVIRONMENT

Cameras

AGENT

Image Classification

Image

Right Class

during Learning

Class

Vision for Action (Open Loop)

Action

Effectors

Right Action

Vision for Action (Closed Loop)

Action

Vision for Action (without supervision)

Qualitative feedback

Outline

( P 5 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Background

Neurosciences & Psychology

• Visual Attention

• Space-based

• Object-based

• Feature-based

• “

• Covert

• Overt

• “

• Bottom-up

• Top-down

Machine vision & Robotics

• Vision is the most informative and challenging sensor

• Huge amount of sensory Information, only a subset is relevant

• Robots should guarantee a short response time

• Applications

• Navigation

• Object recognition,

• HCI, …

Outline

( P 6 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Yarbus 1967

Eye movements as indicators of cognitiveprocesses :

• Trace 1: examine at will

• Trace 2: estimate wealth

• Trace 3: estimate ages

• Trace 4: guess previous

activity

• Trace 5: remember clothing

• Trace 6: remember position

• Trace 7: time since last visit

Saccadic Eye Movements

Outline

( P 7 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Saliency-based ModelL. Itti, C. Koch, and E. Niebur, 1998

)),(()),((4

3

4

2scBYNscRGNC

c

csc

)),((4

3

4

2scINI

c

csc

}º135,º90,º45,º0{

4

3

4

2)),,((

scONNOc

csc

3

)()()( ONCNINS

Math:

Outline

( P 8 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Working Example

Outline

( P 9 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

U-TREEMcCallum 1996

Incremental state space discritization based on minimizing perceptual aliasing

Outline

( P 10 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Biasing Saliency-based Model

Basic idea:

For detection of an object some scales and features are more important. If there are limited computational resources, some unnecessary scales and features could be bypassed. In its current form, saliency model does not allow implementation of computational bottleneck

Step 1: Revising the basic saliency model

Step 2: Biasing it for object detection

fine scale

coarse scalesurround inhibited map

Basic saliency model Our model

SURROUND INHIBITION OPERATION

Borji, Nili, Araabi, Machine vision & Applications, 2009

Outline

( P 11 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Revised Saliency based Model

Gau

ssia

n

pyra

mid

s

Featu

re

map

sC

on

sp

icu

ity m

ap

s

+

Feature decomposition

I C O

WTA

Surround inhibition operation

Offline learned

top-down weights (w)

Input Image

Final Saliency

map

FOA

Across scale addition

Within maps addition

Conspicuity maps addition

Scale

w

eig

hts

Dim

en

sio

n

weig

hts

C

han

nel

weig

hts

Output Image

. . .

. . .

. . .

Intensity (I)

Color (C) Orientation (O)

R/G Y/B 0o 90oI

IOR

Revised saliency model

b) With cost

a) Without cost

Offline learning of top-down biases

xy position of target object

image set

Outline

( P 12 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Convergence

8 16 24 32 400

20

40

60

80

100

120

140

Generation number

Fitn

ess

valu

e

Without-cost case

Bike

Crossing

PedestrianCoke

Triangle

8 16 24 32 400

500

1000

1500

2000

2500

3000

3500

Generation number

Fitn

ess

valu

e

With-cost case

Bike

Crossing

PedestrianCoke

Triangle

8 16 24 32 400

20

40

60

80

100

120

140

Generation number

Fitn

ess

valu

e

Without-cost case

Bike

Crossing

PedestrianCoke

Triangle

8 16 24 32 400

500

1000

1500

2000

2500

3000

3500

Generation number

Fitn

ess

valu

e

With-cost case

Bike

Crossing

PedestrianCoke

Triangle

Scanpathes for detection of objects in natural scenes, each column for one object

Convergence of CLPSO algorithm in minimization of fitness functions

Outline

( P 13 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Synthetic Search Arrays

α in objective function

Outline

( P 14 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Col Int Ori R/G Y/B Int -- / \ l s0 s1 s2 s3 s4 s5

2

4

6Bike

bes

t indiv

idual


2

4

6Crossing

bes

t indiv

idual


2

4

6Pedestrian

bes

t indiv

idual


2

4

6Coke

bes

t indiv

idual


2

4

6Triangle

bes

t indiv

idual

Without cost


2

4

6Bike

bes

t indiv

idual


2

4

6Crossing

bes

t indiv

idual


2

4

6Pedestrian

bes

t indiv

idual


2

4

6Coke

bes

t indiv

idual


2

4

6Triangle

bes

t indiv

idual

With cost

Object Detection in Natural Scenes


2

4

6Bike

bes

t indiv

idual


2

4

6Crossing

bes

t indiv

idual


2

4

6Pedestrian

bes

t indiv

idual


2

4

6Coke

bes

t indiv

idual


2

4

6Triangle

bes

t indiv

idual

Without cost


2

4

6Bike

bes

t indiv

idual


2

4

6Crossing

bes

t indiv

idual


2

4

6Pedestrian

bes

t indiv

idual


2

4

6Coke

bes

t indiv

idual


2

4

6Triangle

bes

t indiv

idual

With cost


2

4

6Synthetic-1

bes

t indiv

idual


2

4

6Synthetic-2

bes

t indiv

idual


2

4

6Synthetic-3

bes

t indiv

idual


2

4

6Synthetic-4

bes

t indiv

idual


2

4

6Synthetic-5

bes

t indiv

idual

Without cost


2

4

6Synthetic-1

bes

t indiv

idual


2

4

6Synthetic-2

bes

t indiv

idual


2

4

6Synthetic-3

bes

t indiv

idual


2

4

6Synthetic-4

bes

t indiv

idual


2

4

6Synthetic-5

bes

t indiv

idual

With cost


2

4

6Synthetic-1

bes

t indiv

idual


2

4

6Synthetic-2

bes

t indiv

idual


2

4

6Synthetic-3

bes

t indiv

idual


2

4

6Synthetic-4

bes

t indiv

idual


2

4

6Synthetic-5

bes

t indiv

idual

Without cost


2

4

6Synthetic-1

bes

t indiv

idual


2

4

6Synthetic-2

bes

t indiv

idual


2

4

6Synthetic-3

bes

t indiv

idual


2

4

6Synthetic-4

bes

t indiv

idual


2

4

6Synthetic-5

bes

t indiv

idual

With cost

Outline

( P 15 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

target

biased saliency model basic saliency model

train testdetection rate %

avg. hit numberdetection

rate %avg. hit number

fitness# test

imagesdetection


bike 92.3(2.1) 1.4(0.4) 22.3(9.2) 60 90.2(2) 1.6(0.1) 81.8(0.6) 2.2(0.2)

crossing 96.7(1.2) 1.5(0.7) 18.1(5.4) 35 93.8(0.9) 1.5(0.2) 78.2(1.4) 2.5(0.3)

pedestrian 98.2(1) 1.2(0.2) 14.4(6.1) 45 94.2(1.1) 1.3(0.7) 83.3(1) 1.7(0.1)

coke 95.2(1.4) 1.3(0.3) 13.2(8) 59 92.2(2) 1.5(0.5) 80.9(0.4) 1.9(0.5)

triangle 92.5(2.3) 1.7(0.9) 27.8(11.4) 32 91(1.6) 1.8(0.4) 76.5(0.8) 2.2(0.2)

targettrain test

detection rate %

avg. hit number

fitness# test

imagesdetection


computation cost

bike 87.8(1.2) 1.7(0.3) 421.9(25.6) 60 85.8(1.5) 1.8(0.2) 25.5(4.2)

crossing 84.2(1.9) 1.5(0.7) 302.1(47.1) 35 78.2(2.1) 2(0.3) 40.5(5.8)

pedestrian 91.6(1) 1.5(0.4) 531.9(54.3) 45 90.6(2) 1.7(0.4) 35.6(5.2)

coke 87.3(2.7) 1.8(0.6) 730(34.4) 59 87.1(1.1) 1.9(0.1) 32.2(7.3)

triangle 89.6(2.1) 1.4(0.2) 512.2(27.7) 32 85.6(1.2) 2.1(0.2) 32.7(6.1)

Object Detection

a)

With

ou

t co

stComparing detection rate and cost of our method with basic saliency model

Total cost of basic saliency model is 52, with an assumed cost vector

Training over 10 random images with each object and testing over the rest

Hit number is x if object is detected in x-th saccade

b)

With

co

st

Outline

( P 16 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Learning top-down Object-based attention

• There is a debate between object-based and spatial attention control

• There are some models for learning object-based attention. Most famous and recent by SUN et al.

• Problem with the previous models in this area: They have not used proper tools for modeling and taking into consideration task demands. By proper we mean that they are limited to static scenes in limited control situations

• Since scene understanding is much dependent to actions and goals, an interactive approach seems more suitable

• Our model is the first one which uses the RL for this purpose + idea of state-space discretization

Long-term Memory

Visual SensorBottom-up Visual Attention

Recognized object

Object Recognition, Scene Interpretation, etc.

FOA

RL System

CriticVi

sual

Stim

ulus

Motor ActionState

Atten

ded

obje

ct

Atten

tion

Tree

/ St

ate

Extr

acto

r

Reward

Quasi-Static Learning

Stat

e an

d Ac

tion TD Error

Dec

isio

n M

akin

g an

d L

earn

ing

Ear

ly V

isio

n H

ighe

r Vi

sion

Learned weights (biases)

TD Error

Outline

( P 17 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works


2

4

6Bike

best in

div

idual


2

4

6Crossing

best in

div

idual


2

4

6Pedestrian

best in

div

idual


2

4

6Coke

best in

div

idual


2

4

6Triangle

best in

div

idual

Without cost


2

4

6Bike

best in

div

idual


2

4

6Crossing

best in

div

idual


2

4

6Pedestrian

best in

div

idual


2

4

6Coke

best in

div

idual


2

4

6Triangle

best in

div

idual

With cost

Sample objects in natural scenes. Best individual derived after minimization was applied to a test set.

Learned weights after CLPSO convergence over first two traffic signs averaged over five runs. s0 to s5 are scales in the image pyramid.

Early Vision Layer

Outline

( P 18 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

• The object at the attended location is recognized by the hierarchical model of object recognition (HMAX).

• A binary SVM classifier, is trained with positive samples of a class and negative samples from other classes.

• Offline learned classifier in this way is later used for online object recognition.

OR rates over test sets:

91.28% (± 2.8), 93% (± 2.75) 87% (± 3.2) 83.4% (± 4.2) 83% (± 4.2%)

Higher Vision Layer

Outline

( P 19 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Task: simulated visual navigation.

• Map of the route, consisting of 11 positions. There are 44 states.• The agent captures 360 × 270 RGB color images. Natural scenes

containing a subset of the objects are presented to the agent (5 for each combination).

• The agent has three possible motor actions: forward(F), Turn Left (L) and Turn Right(R) and can attend to one of n objects each time (n=5).

Navigation map in the experiment. A subset of 5 objects

is are present in random locations of scenes. Best actions are shown besides each state. In

some states two actions are optimal.

Visual Navigation

Outline

( P 20 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

01

1 0

1 0

1 0

A

B

EF

F R

1,8, 12,13,18,24,28,31, 36, 40, 44 26, 35 19

1 0

1 0

B

DR

L C

2,5,9,10,21, 25, 32, 33, 37, 41

3,15,16,20,27,29,30,34,38,42

6,14,22

L R

4,7,11,17,23,39,43

Algorithm generated 7 states with average depth of 3. It means that instead of attending to five objects simultaneously, serial attention to 3 objects in average could solve the problem.

Learned attention tree for the map of Fig. 6 with pruning.

Forty four states were clustered into 7 leaves. 100% correct policy was achieved.

Visual Navigation

Outline

( P 21 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

• Interactive and concurrent learning of top-down and action-based attention control

• Learning attentions when forming visual representations

• A spatial attention model

• Considering task demands

• Basics of our method like RLVC is the G and

U-TREE algorithms

• Motivated by eye movements (saccades) for visual processing

Learning to SaccadeBorji, Nili, Araabi, ICRA, 2009 - Image & Vision Computing, Under review

Outline

( P 22 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Overall Model

Motor action

Visual stimulus

Reward distributor

World World

Feature extractor Codebook Database

RL PolicyState

Feature code

Attention loop

Ove

rt a

tten

tion

/E

ye m

ove

men

t

Attention tree

Reward (Quasi-static learning)

GISTExtractor

FOA

GIST

Agent

Outline

( P 23 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Relative: Saccade directions could be selected with respect to the current saccde when there is cost associated with moving sensory organs

Absolute: Or in random locations with respect to each other

Clustering SIFT features

SIFT features of

Q = (q1, q2, …, qm)

Sample images are extracted and then clustered into

n clusters (β1, β2, …, βn)

Using standard K-means clustering algorithm.

Then codebook of SIFT feature k is the SIFT class with minimum distance to this feature:

d = argminj |K- βj |

Sample objects with derived SIFT features

S-Tree Algorithm

Outline

( P 24 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

RL – Update

RL – Update

S-Tree Algorithm

p1 p2 … Δt Action

2 3 x .31 Left

4 1 y .12 Right

…

Memory under a node

actioncodebook

Direction or location of saccade

Best saccade

Direction or location

leaves,

sta

tes

in Q

-Table

inte

rnal nodes

Tree – Update

Tree – Update

Aliased state

No further aliasing ? Terminate

TD error (Bellman residual)

Q-Learning Update formula

..

SIFT features Pi’s

Aliasing ? Yes

Aliased state

Outline

( P 25 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Visual Navigation Task

• Consider a discrete maze with walls

• An agent moves in the maze (penalty of 0 or −1 by move).

• The agent must reach the exit as fast as possible (reward of +10).

• The sensors return a picture of an object that depends on the cell.

Outline

( P 26 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

All distinct 3 equal 7 equal 6 equal & 7 equal0

5

10

15

Perceptual states(Leaves)

All distinct 3 equal 7 equal 6 equal & 7 equal1

1.2

1.4

1.6

1.8

2

Avg. Tree Depth

Action Effects

Same objects were placed at different positions with same best actions

Outline

( P 27 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

10×10 Visual Gridworld

Outline

( P 28 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

1 2 3 4 5 6 7 8 9-0.4

-0.2

0

0.2

0.4

Tree-fixed Iteration

Average Reward

1 2 3 4 5 6 7 8 90

1

2

3

Tree-Update Iteration

Average Variance

1 2 3 4 5 6 7 8 90

50

100

150


Tree Size

Perceptual States (Leaves)

1 2 3 4 5 6 7 8 90

1

2

3

4


Average Tree Depth

1 2 3 4 5 6 7 8 90.7

0.75

0.8

0.85

0.9

0.95

1


Correct Policy Rate

Performances

Performance of S-Tree during resolving 10 by 10 visual Grid

Outline

( P 29 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

1 2 3 4 5 6 7 8 9 10

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1


Correct Policy Rate

Perturbation Analysis

a) b)

Change this image into a new one after learning

Change three different images at three different locations each time after convergence

Outline

( P 30 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

POMDP Cases

TD-error in this state

What if perceptual information is not enough?

Employ a short term memory to make the environment Markov

Pair of previous action, state

j-th memory item of i-th stateCodebooks at

q spatial locations

Short term memory of depth n

Elicited action in this state

Outline

( P 31 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Object Recognition

Can be applied for object recognition when reward is defined as: where diagonal elements are correct classification of an object in respective class - TD error reduces to only r

Outline

( P 32 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Invariance Analysis

0 1 2 3 4 5 6 7 8 9 10 11 12 13 140

50

100

150

200

250

Grid position

Sta

te N

um

ber

Assignment of views to states

0 degree 30 degree 60 degree

1 2 3 4 5 6 7 8 9 10 11 12 130

20

40

60

80

100

120

140

160

Grid position

Sta

te N

umbe

r

Assignment of views to states

0 degree 5 degree 10 degree 15 degree 20 degree

Different views of an object or scene were

observed in each grid position randomly

Outline

( P 33 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

State Reduction Index:

How much views are grouped1 2 3 4 5 6 7 8 9 10 11 12 13

0

100

200

300

Assignment of views to statesS

tate

Num

ber


1 2 3 4 5 6 7 8 9 10 11 12 130

50

100

150

Sta

te N

umbe

r


1 2 3 4 5 6 7 8 9 10 11 12 130

10

20

30

40

50

Sta

te N

umbe

r

Grid position


Nearest SIFT to saccade center

SIFT with highest magnitude

RLVC

Over natural scenes

Invariance Analysis

Outline

( P 34 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Saliency for Scene Classification

Narrowest is the first salient region

Can we do successful scene classification by only looking at salient regions?

Are they stable regions?

Outline

( P 35 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Summary & Conclusions

• A bottom-up model allowing object detection and bottleneck implementation

• Top-down object-based attention over basic saliency model !

• Closed-loop learning of image-to-action mappings (S-Tree)

• Interactive, task-driven, Space based, Bioinspired

• Scene classification by feature extraction at salient regions !

Summary

• Top-down task-relevance attentional modulations could be derived interactively in RL framework

• Attention is best to be learned in concert with physical actions and representations

• An spectrum of image classificaion problems

• Nature of the b.u attention is low-level mechanisms, while t.d attention is more like a control or a decision making problem

Conclusions

Outline

( P 36 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Future Directions

• How can structure be encoded in a bottom-up attention model?!

• Top-down bias for generating stable salient regions

• Context & Spatial Constraints among objects as well as semantics

• How complex scene analysis could be performed taking advantage of attention? Like LABELME dataset

• Approaches based on U-TREE fit solutions to the problem at hand

Such approaches should be extended to allow solving other similar tasks (Some way bet. Classic and New AI)

• Remedy Generalization – Max generalization with minimum spatial processing!

• Implementation on simulated or real robot on tasks containing real-world natural scenes

Outline

( P 37 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Related Publications

• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi, Saliency Maps for Attentive and Action-based Scene Classification, In Preparation.

• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi, Interactive Learning Of Space-based Visual Attention Control And Physical Actions, To Be Submitted.

• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi , Interactive Learning Of Task-driven Object-based Visual Attention Control, Image And Vision Computing, Passed First Review

• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi , Cost-sensitive Learning Of Top-down Modulation For Attentional Control, Machine Vision And Applications, In Press.

• Mostafa Ajalooian, Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi, Hadi Moradi, Fast Hand Gesture Recognition based on Saliency Maps: An Application to Interactive Robotic Marionette Playing, IEEE ROMAN 2009, Japan, Sept. 2009.

• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi, Learning Sequential Visual Attention Control through State Space Descritization, IEEE ICRA 09, Kobe, Japan, May 2009.

• Ali Borji, Majid Nili Ahamadabadi, Babak Nadjar Araabi. Interactive Learning of Top-down Attention Control and Motor Actions, IEEE IROS 2008 workshop on From Motor to Interaction Learning in Robots.

• Ali Borji, Majid Nili Ahmadabadi, Babak Nadjar Araabi, Learning Object-based Attention Control, in NIPS 2008 workshop on Machine Learning Meets Human Learning.

Outline

( P 38 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Thanks for your

attention

Outline

( P 39 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Invariancy Analysis

Outline

( P 40 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Future Work

Generalization Problem Modeling Context and Spatial Relations among visual

features & objects Scaling up Solutions to Large-scale data like LabelME Is there a method which could detect all objects in a scene

with high accuracy? Implementation on a real or simulated robot Extending Purpoive Solutions for Embodied Agents

(Like a U-Tree which considers body form of a robot) Generalization over other tasks

How to generalize the method, so derived knowlege from a task (here learned attention tree), could be used for other tasks with similar perceptions?

…

Outline

( P 41 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Historical question : “How do animals learn?”

RL Answer : Through their interactions with the environment, that give rise to a positive or negative feedback (trial-and-error).

More precisely : By learning a percept-to-action mapping that maximizes, over time, an evaluation of its performances given by the environment.

Examples : A dog learns to sit down by receiving sugars from its

master. A robotic hand learns to grasp objects by receiving an

information about the quality of the grasp from the physical world.

Reinforcement Learning

Outline

( P 42 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works


Basic Principles : The agent knows nothing about its environment. It only knows about its percepts and actions. After each interaction, it receives a numerical feedback. It progressively improves its policy by trying new actions.

Advantageous: No need of a physical model of the environment (while it

can accelerate learning). Therefore: General Approach, Simple design.

Allows a dynamical adaptation when the environment changes

Outline

( P 43 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works


S : finite set of states A : finite set of actions T(s, a, s’) : probabilistic transition function r(s,a) : numerical reinforcement function

Markov Decision Process (MDP)

Markovian Probabilistic Dynamics

Inputs: A database of interactions

Output: An optimal control policy

Reinforcement Learning Process

Outline

( P 44 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works


We don’t want to maximize immediate rewards (the sequence of rt ), but the rewards over time.

Temporal Credit Assignment

Markovian Probabilistic Dynamics

where γ ε [0,1[ is the discount factor giving the current value of the future rewards (i.e., a reward percieved k units of time later is only worth γk what it would represent currently)

γ = 0 short-sighted agent : maximize immediate rewards

γ = 1 agent with a more and more faraway horizon

Outline

( P 45 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

5 10 15 20 25 30-1

0

1

2

3

4

Epizode

Rew

ard

Average Rew ard

Smoothed Average Rew ard (W=2)

0.5 1 1.5 2 2.5 3 3.5 4 4.5-0.5

0

0.5

1

1.5

2


1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30

5

10

15

20


Average Variance

Average Reward

RL Grid Reward

Traditional RL

Outline

( P 46 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Algorithm

Outline

( P 47 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Visual Representations

Appearance-based Vision : Is not geometry-based vision or model-based vision, Structured-based Vision,

Global Appearance Approaches Normalized Images Eigen-Patches Histograms (color, texture, …)

Local Appearance Approaches Harris Corner Detector Interest Point Detector Local Descriptor

Exploiting Visual Features

Outline

( P 48 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Related Works

Modeling

Visual Attention

Psychophysics

Neurophysiology

Filter ModelsConnectionist

Models

Bottom up Top down

Navalpakkam 2005- Mozer et al. 2005- Frintrop et al. 2005- Palleta et al. –

GAO et al. – Waltehr et al. Sun et al.- Triiesch et al, This work

Itti et al. 98,

Torrallba et.al 2006

×

Itti, Koch 2001

Minut et al. 2001

Sprague, N 2003

Cesar Bandera 1996

Reichle et al. 2006:

Modeling eye movements of an expert

reader

Maljkovic and Nakayama 1994

Della Libera & Chellazi, 2007

Outline

( P 49 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Feature Maps

• Orientation

• Color

• Curvature

• Line end

• Movement

Saliency Concept

“Shifts in selective visual attention: towards the underlying neural circuitry”, C. Koch, and S. Ullman, 1985

Central RepresentationAttentionSalienc

ySalienc

y

Feature Maps

• Orientation

• Color

• Curvature

• Line end

• Movement

Feature Maps

• Orientation

• Color

• Curvature

• Line end

• Movement

Feature Maps• Orientation

• Color

• Curvature

• Line end

• Movement

Outline

( P 50 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Eye Movements

Saccades:

Quick “jumps” that connect fixations Duration is typically between 30 and 120 ms Very fast (up to 700 degrees/second) Saccades are ballistic, i.e., the target of a saccade

cannot be changed during the movement. Vision is suppressed during saccades to allow

stable perception of surroundings. Saccades are used to move the fovea to the next

object/region of interest.

Outline

( P 51 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

a) The final classifier that tests the presence of the circled local-appearance features

b) The label of the perceptual class that is assigned to each empty cell

C) The computed optimal policy for this classification

Jodogne & piater JAIR 2007

Working Example

Outline

( P 52 / 37) Ali Borji – Oct. 2009

Statement & Scope

Background

Related Research

• Saliency Model

• U-TREE


• Revised Model


Learning to saccade

• S-Tree

• POMDP Cases



Conclusions

Future Works

Cost Vector in Biasing

C = [3, 1, 4, 3, 3, 1, 4, 4, 4, 4, 6, 5, 4, 3, 2, 1]

Technology

Defence session