Upload
garrett-marlar
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Standard Brain Model for Vision
The talk is given by Tomer Livne and Maria Zeldin
Overview
Introduction to biological basis of vision
Computer analogy to biology Implementation Discussion
Overview of biological vision
Hierarchical structure From simple features to complex
ones (Hubel & Weisel) Increased invariance
Hubel and Weisel (1962, 1965) following experimental results proposed a model in which neighbouring simple cells are combined into complex cell.
The result is complex cells with phase independence.
The basic idea
Max vs. sum pooling
Electrophysiological results indicate that pooling may not be linear, the response of a complex cell can be best described by the activity of its maximal afferent.
From simple to complex cells:
A straightforward extension of this is to start with simple cells and end up with “higher-order-hyper-complex cells”.
This is the basis for all the hierarchy idea!
The hierarchy based on the brain model:
Hierarchical models of object recognition in cortex. Reisenhuber and Poggio. Nature, november 1999.
Clearer explanation of the hierarchy
-
|
\
/
1
0
0
0
0.7
0.7
0
0
1
0.7
0
0
-
|
\
/
Simple cells
Complex cells
Max pooling
orientations
Computer vision Usual approach – image patching Biological motivated approach -
hierarchy
Representing objects by invariant complex features
Hierarchical models of object recognition in cortex. Reisenhuber and Poggio. Nature, november 1999.
The IT area in the brain is dealing with object recognition. In this area there are cells that respond best to a specific object
Recognize the same faces
In the previous task our brains did a very good job in recognizing same face even thou the scale, impression, illumination were different.
And did not classified different faces as same even thou they have similar physical conditions
Motivation
The presented approach is trying to implement into a computer system the hierarchical idea that was presented. In order to achieve similar robustness.
The models that we present deal with more general problem which is object classification.
We can say that the problem of recognition of different transformations of an object is similar to the problem of classification.
Reisenhuber & Poggio (1999) demonstrate that it can.
Comparing electrophysiological results from cells in the monkey brain with implemented hierarchical model.
Can computers reach similar properties to biology?
Training stage:
The monkey was trained to recognize restricted set of views of unfamiliar target stimuli resembling paperclips. They check which IT cell responds best to all views. After finding the cell that responded the most was picked for the study.
Test stage:
The best reaction of the cell was to the trained data.The second best was to new transformations of the trained object. And very little response to new objects (distractors)
Learning the results:
Hierarchical models of object recognition in cortex. Reisenhuber and Poggio. Nature America Inc, november 1999.
The hierarchy based on the brain model:
Hierarchical models of object recognition in cortex. Reisenhuber and Poggio. Nature, november 1999.
We saw this part
Now lets compare it to the model
Hierarchical models of object recognition in cortex. Reisenhuber and Poggio. Nature America Inc, november 1999.
Hierarchical models of object recognition in cortex. Reisenhuber and Poggio. Nature America Inc, november 1999.
Results of scrambling
Goal- brain based object classification
Biology view of the problem
implementation of hierarchical structure
comparing true results to model results
Summary
Models based on the hierarchical idea we already discussed
Riesenhuber & Poggio (1999)
Serre & Riesenhuber (2004)
Serre, Wolf, Bileschi, Riesenhuber, & Poggio (2007)
Mutch & Lowe (2006)
Modifications of the basic ideas
limitations and shortcomings
What’s next?
Method #1Riesenhuber & Poggio , ”Hierarchical models of objects recognition in cortex”, Nature 1999
Later it was modified by Serre, Wolf, Bileschi, Riesenhuber, & Poggio, “Robust object recognition with cortex-like mechanisms”, 2007.
S1 – Gabor filters 16 different sizes (7X7, 9X9,…,37X37) 4 orientations A total of 64 S1 type detectors
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
A serial implementation of filtering
C1 – MAX pooling 8 different sizes (8X8, 10X10,…,22X22) 4 orientations A total of 32 C1 type detectors Used to define features during the learning stage
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
S2 – learned features Holds N learned features 4 patch sizes (4X4, 8X8, 12X12, 16X16)
indicating how many C1 neighboring cells are considered (this is done separately for each C1 scale)
For each image patch X, a Gaussian radial basis function that depends on an Euclidean distance, is calculated from each of the stored features Pi (i=1:N) r=exp(-β ||X – Pi||²)
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
C2 – max pooling For each stored feature the best match (closest)
Classifier Classification is based on both C1 and C2
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
Summery
4 Layers of processing 2 types of operations (Max, Sum) Output – N dimensional vector
Model’s performance
Testing the model Defining features Flexibility of the design
Robustness to background
Ignoring presented unrelated data Training and test images contains both targets and
distractors Performed best with C2 type detectors Simple detection – present/absent (no location information) Approaches maximal performance with 1000-5000 features Performance improve with increased training (more
examples)
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
Object specific features or a universal dictionary A Universal dictionary based system is good for
small training sets (10,000 features) An object specific based system is better when
using large training sets (improves with practice – increased number of features [200 an image])
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
Object recognition without a clutter
Scene understanding using a windowing strategy
Large inter-category variability
Training sets of only either positive (target) or negative (no target)
2 classification systems: C1 and C2 based
C1 based system performs better (able to efficiently represent objects’ boundaries)
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
Texture based objects
Again C1 and C2 based classifiers C2 features are now evaluated only locally,
not over all image locations C2 based classification is better (the
features are more invariant and complex) Evaluated by correct labeling of pixels in
the image
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
A unified system – looking at multiple processing levels The hierarchical nature of the described
system enables the use of multiple levels of feature
Recognizing both shape and texture based objects in the same image
Two processing pathways
Robust object recognition with cortex-like mechanisms. Serre, Wolf, Bileschi, Reisenhuber and Poggio. IEEE, march 2007.
Scene understanding task
Complex scene understanding requires more than just detection of objects, location information of the detected objects is also required
Shape-based objects C1 based classification, using a windowing approach, for
both identification and localization Local neighborhood suppression by the maximal
detected result Texture-based objects
C2 based classification texture boundaries posses a problem (solved by
additionally segmenting the image and averaging the responses within each segment)
Model summery
Hierarchical design Efficiency Multiple processing pathways Universality Vs. specificity Limitations
Method #2
Mutch & Lowe Multiclass Object Recognition with Sparse, Localized Features. 2006.
Image scaling – 10 scales
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
S1 – Gabor filters Single scale (11X11) 4 orientations applied to every
location Evaluated at all
possible locations
C1 – local invariance Max pooling using a
10X10(size)X2(scale) filter
Each orientation is tested separately
used to define features during the learning stage
Larger skips
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
S2 – intermediate features 4 filter sizes (4X4,
8X8, 12X12, 16X16) defined by the stored features
A Universal feature set
Response to each filter (feature) is calculated as
R(X,P) = exp[-(||X – P||²)/2σ²α]
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
C2 – Global invariance A vector of size d of
the maximal response (anywhere in the image) to each feature.
SVM classifier Majority-voting
based decision
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
The overall look on all the stages:
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
Summary
Similar assumptions Differences in construction
Model performance and improvements
Testing classification More biologically motivated
improvements
Tests classification
101 categories (from Caltech101) Trained sets of 15 (or 30) images of each
category Learn random features (in both size and
location), an equal number for each category Construct C2 vectors Train the SVM (on the improved model also
perform feature selection) Test stage
Results of the test:
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
S2 – encodes only the dominant orientation at each location.
Increased number of tested orientations (from 4 to 12)
Lateral inhibition – suppressing below threshold filter outputs in S1 & C1 layers
Limited S2 invariance – in order to allow for preserving a certain amount of geometrical relations, S2 feature are limited to certain places in the image (relative to the center of the object)
Select only good features for classification
To get better results, some improvements were added to the model:
Running the previous test on the improved model lead to the following results:
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
Refining the model
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
Testsdetection/localization Sliding window Merging
overlapping detections
Single/multiple scale test images
Multiclass Object Recognition with Sparse, Localized Features. By Mutch & Lowe. IEEE 2006
Summery
Efficiency Improvements Limitations
THE END
Thank you for listening!
Simple cell is an early visual neuron meaning it responds best to a line of a specific size, orientation, and phase.
This cell responds best to 90 deg. phase.
This cell responds best to 180 deg. phase.
back
back
Image
Simple cell (phase sensitive)
Complex cell (phase insensitive)