Upload
zukun
View
241
Download
1
Embed Size (px)
Citation preview
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 1/69
Lecture 4Explicit and implicit 3D object models
6.870 Object Recognition and Scene Understandinghttp://people.csail.mit.edu/torralba/courses/6.870/6.870.recognition.htm
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 2/69
Monday
Recognition of 3D objects
Presenter: Alec Rivers
Evaluator:
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 3/69
2D frontal face detection
Amazing how far they have gotten with so little«
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 4/69
People have the bad taste of not being
rotationally symmetric
Examples of un-collaborative subjects
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 5/69
Objects are not flat*
*In the old days, some toy makers and few people working on face detection
suggested that flat objects could be a good approximation to real objects.
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 6/69
Solution to deal with 3D variations:
³do not deal with it´³not´-Dealing with rotations and pose:
Train a different
model for each view.
The combined detector is invariant to pose variations without an explicit 3D model.
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 7/69
viewpoints
Need to detect Nclasses * Nviews * Nstyles, in clutter.
Lots of variability within classes, and across viewpoints.
Object classes
And why should we stop with pose?
Let¶s do the same with styles,
lighting conditions, etc, etc, etc«
So, how many classifiers?
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 8/69
Depth without objects
Random dot stereograms (Bela Julesz)
Julesz, 1971
3D is so important for humans that wedecided to grow two eyes in front of the
face instead of having one looking to the
front and another to the back.
(this is not something that Julesz said« but he could, maybehe did)
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 9/69
Objects 3D shape priors
by H Bülthoff Max-Planck-Institut für biologische Kybernetik in Tübingen
Video taken from http://www.michaelbach.de/ot/fcs_hollow-face/index.html
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 10/69
3D drives perception of important
object attributes
by Roger Shepard (´Turning the Tables´)
Depth processing is automatic, and we can not shut it down«
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 11/69
3D drives perception of important
object attributes
Frederick Kingdom, Ali Yoonessi and Elena Gheorghiu of McGill Vision Research unit.
The two Towers of Pisa
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 12/69
It is not all about objects
3D percept is driven by the scene, which imposes its ruling to the objects
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 13/69
Class experiment
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 14/69
Class experiment
Experiment 1: draw a horse (the entire
body, not just the head) in a white piece of
paper.
Do not look at your neighbor! You already
know how a horse looks like« no need to
cheat.
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 15/69
Class experiment
Experiment 2: draw a horse (the entire
body, not just the head) but this time
chose a viewpoint as weird as possible.
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 16/69
Anonymous participant
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 17/69
3D object categorization
Wait: object categorization in humans is not
invariant to 3D pose
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 18/69
3D object categorization
byGreg Robbins
Despite we can categorize all three
pictures as being views of a horse,
the three pictures do not look asbeing equally typical views of
horses. And they do not seem to be
recognizable with the same
easiness.
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 19/69
Observations about pose invariance
in humans
Canonical perspective
Priming effects
Two main families of effects have been observed:
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 20/69
Canonical Perspective
From Vision Science, Palmer
Experiment (Palmer, Rosch & Chase 81):
participants are shown views of an object
and are asked to rate ³how much each one
looked like the objects they depict´
(scale; 1=very much like, 7=very unlike)
5
2
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 21/69
Canonical Perspective
From Vision Science, Palmer
Examples of canonical perspective:
In a recognition task, reaction time
correlated with the ratings.
Canonical views are recognized faster
at the entry level.
Why?
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 22/69
Canonical Viewpoint
Frequency hypothesis
Maximal information hypothesis
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 23/69
Canonical Viewpoint
Frequency hypothesis: easiness of recognition is
related to the number of times we have see the
objects from each viewpoint.
For a computer, using its Google memory, a horse
looks like:
It is not a uniform sampling on viewpoints
(some artificial datasets might contain non natural statistics)
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 24/69
Canonical Viewpoint
Frequency hypothesis: easiness of recognition is
related to the number of times we have see the
objects from each viewpoint.
Can you think of some
examples in which this
hypothesis might be
wrong?
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 25/69
Canonical Viewpoint
Maximal information hypothesis: Some views
provide more information than others about the
objects.
From Vision Science, Palmer
Best views tend to showmultiple sides of the
object.
Can you think of someexamples in which this
hypothesis might be
wrong?
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 26/69
Canonical Viewpoint
Maximal information hypothesis:
Clocks are preferred as purely frontal
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 27/69
Canonical Viewpoint
Frequency hypothesis
Maximal information hypothesis
Probably both are correct.Edelman & Bulthoff 92: created new objects to control familiarity.
1- When presenting all view points with the same frequency, observers had
preference for specific viewpoints.
2- When few viewpoints were presented, recognition was better for previously
seen viewpoints.
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 28/69
Observations about pose invariance
in humans
Canonical perspective
Priming effects
Two main families of effects have been observed:
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 29/69
Priming effects
Priming paradigm: recognition of an object is
faster the second time that you see it.
Biederman & Gerhardstein 93
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 30/69
Priming effects
Same
exemplars
Differentexemplars
Biederman & Gerhardstein 93
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 31/69
Priming effects
Biederman & Gerhardstein 93
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 32/69
Object representations
Explicit 3D mode
ls: use volumetricrepresentation. Have an explicit model of
the 3D geometry of the object.
Appealing but hard to get it to work«
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 33/69
Object representations
Imp
licit 3D mode
ls: matching the input
2Dview to view-specific representations.
Not very appealing but somewhat easy to get it to work*«
* we all know what I mean by ³work´
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 34/69
Object representations
I
mplicit 3D mode
ls: matching the input
2Dview to view-specific representations.
The object is represented as a collection of 2D
views (maybe the most frequent views seen in thepast).
Tarr & Pinker (89) show people are faster at
recognizing previously seen views, as if they were
storing them. People were also able to recognize
unseen views, so they also generalize to new
views. It is not just template matching.
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 35/69
Why do I explain all this?
As we build systems and develop
algorithms it is good to:
± Get inspiration from what others have thought
± Get intuitions about what can work, and how
things can fail.
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 36/69
Explicit 3D model
Object Recognition in the Geometric Era: a Retrospective, Joseph L. Mundy
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 37/69
Explicit 3D model
Not all explicit 3D models were disappointing.
For some object classes, with accurategeometric and appearance models, it is
possible to get remarkable results.
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 38/69
A Morphable Model for the Synthesis
of 3D Faces
Blanz & Vetter, Siggraph 99
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 39/69
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 40/69
A Morphable Model for the Synthesis
of 3D Faces
Blanz & Vetter, Siggraph 99
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 41/69
We have not achieved yet the same level of
description for other object classes
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 42/69
Implicit 3D models
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 43/69
Aspect Graphs
³The nodes of the graph represent object views that are adjacent to each other on the unit sphere of viewing directions but differ in some significant way.The most common view relationship in aspect graphs is based on thetopological structure of the view, i.e., edges in the aspect graph arise fromtransitions in the graph structure relating vertices, edges and faces of the
projected object.´ Joseph L. Mundy
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 44/69
Aspect Graphs
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 45/69
Affine patches
Revisit invariants as a l ocal description of
3D objects: Indeed, although smooth
surfaces are almost never planar in the
large, they are always planar in the small
3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial
Constraints. F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, IJCV 2006
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 46/69
Affine patches
Two steps:
1. Detection of salient image regions
2. Extraction of a descriptor around the
detected locations
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 47/69
Affine patches
Two steps:
1. Detection of salient image regions
(Garding and Lindeberg, 96; Mikolajczyk and Schmid, 02)
a) an elliptical image region is deformed to maximizethe isotropy of the corresponding brightness pattern.
b) its characteristic scale is determined as a local
extreme of the normalized Laplacian in scale space.
c) the Harris (1988) operator is used to refine theposition of the ellipse¶s center.
The elliptical region obtained at convergence can be
shown to be covariant under affine transformations.
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 48/69
Affine patches
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 49/69
Affine patches
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 50/69
Affine patches
ff
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 51/69
Affine patches
Affi h
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 52/69
Affine patches
Each region is represented with
the SIFT
descriptor.
Affi t h
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 53/69
Affine patches A coherent 3D interpretation of all
the matches is obtained using a
formulation derived from
structure-from-motion and
RANSAC to deal with outliers.
Affi t h
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 54/69
Affine patches
P t h b d i l i d t t
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 55/69
Patch-based single view detector
Car modelScreen model
Vidal-Naquet, Ullman (2003)
F i l i
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 56/69
For a single view
First we collect a set of part templates from a set of trainingobjects.
Vidal-Naquet, Ullman (2003)
«
E t d d f t
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 57/69
Extended fragments
View-Invariant Recognition Using Corresponding Object Fragments
E. Bart, E. Byvatov, & S. Ullman
E t d d f t
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 58/69
Extended fragments
View-Invariant Recognition Using Corresponding Object Fragments
E. Bart, E. Byvatov, & S. Ullman
E t d d f t
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 59/69
Extended fragments
View-Invariant Recognition Using Corresponding Object Fragments
E. Bart, E. Byvatov, & S. Ullman
E t d d f t
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 60/69
Extended fragments
Extended patches are extracted using short sequences.
Use Lucas-Kanade motion estimation to track patches across the sequence.
L i
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 61/69
Learning
Once a large pool of extended fragments is created, there
is a training stage to select the most informativefragments.
For each fragment evaluate:
Select the fragment B with
In the subsequent rounds, use
Class label Fragment present/absent
All these operations are easy to compute. It is just counting.
If C and Fare independent,then I(C,F) = 0
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 62/69
1
0
1
1
0
0
0
0
0
1
C
1
1
1
1
1
0
0
0
0
0
F
P(C=1, F=1) = 3 / 10
P(C=1, F=0) =
P(C=0, F=1) =
P(C=0, F=0) =
Training without sequences
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 63/69
Training without sequences
Challenges:
- We do not know which fragments are incorrespondence (we can not use motion
estimation due to strong transformation)
Fragments that are in correspondence will have
detections that are correlated across viewpoints.
The same approach can be used for
arbitrary transformations
Bart & Ullman
Shared features for Multi view object
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 64/69
Shared features for Multi-view object
detection
Viewinvariant
features
View
specific
features
Training does not require having different views of the same object.
Torralba, Murphy, Freeman. PAMI 07
Shared features for Multi view
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 65/69
Sharing is not a tree. Depends also on 3D symmetries.
«
«
Shared features for Multi-view
object detection
Torralba, Murphy, Freeman. PAMI 07
Multi view object detection
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 66/69
Multi-view object detection
Strong learner
H response for
car as function
of assumed
view angle Torralba, Murphy, Freeman. PAMI 07
Voting schemes
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 67/69
Voting schemes
Towards Multi-View Object ClassDetection
Alexander Thomas
Vittorio Ferrari
Bastian Leibe
Tinne Tuytelaars
Bernt SchieleLuc Van Gool
Viewpoint Independent Object Class Detection using 3D Feature Maps
8/3/2019 MIT6870_ORSU_lecture4: Explicit and implicit 3D object models
http://slidepdf.com/reader/full/mit6870orsulecture4-explicit-and-implicit-3d-object-models 68/69
Viewpoint-Independent Object Class Detection using 3D Feature Maps
Training dataset: synthetic objects
Features
Voting scheme and detectionEach cluster casts votes for the
voting bins of the discrete poses
contained in its internal list.
Liebelt, Schmid, Schertler. CVPR 2008
Monday