Upload
bertha-mckenzie
View
225
Download
1
Embed Size (px)
Citation preview
1 Ellen L. Walker
Object Recognition
Instance Recognition
Known, rigid object
Only variation is from relative position & orientation (and camera parameters)
“Cluttered image” = possibility of occlusion, irrelevant features
Generic (category-level) Recognition
Any object in a class (e.g. chair, cat)
Much harder – requires a ‘language’ to describe the classes of objects
2 Ellen L. Walker
Instance Recognition & Pose Determination
Instance recognition
Given an image, what object(s) exist in the image?
Assuming we have geometric features (e.g. sets of control points) for each
Assuming we have a method to extract images of the model features
Pose determination (sometimes simultaneous)
Given an object extracted from an image and its model, find the geometric transformation between the image and the model
This requires a mapping between extracted features and model features
3 Ellen L. Walker
Instance Recognition
Build database of objects of interest
Features
Reference images
Extract features (or isolate relevant portion) from scene
Determine object and its pose
Object(s) that best match features in the image
Transformation between ‘standard’ pose in database, and pose in the image
Rigid translation, rotation OR affine transform
4 Ellen L. Walker
What Kinds of Features?
Lines
Contours
3D Surfaces
Viewpoint invariant 2D features (e.g. SIFT)
Features extracted by machine learning (e.g. principal component features)
5 Ellen L. Walker
Geometric Alignment
OFFLINE (we don’t care how slow this is!)
Extract interest points from each database image (of isolated object)
Store resulting information (features and original locations) in an indexing structure (e.g. search tree)
ONLINE (processing time matters)
Extract features from new image
Compare to database features
Verify consistency of each group of N (e.g. 3) features found from the same image
6 Ellen L. Walker
Hough Transform for Verificaton
Each minimal set of matches votes for a transformation
Example: SIFT features (location, scale, orientation)
Each Hough cell represents Object center’s location (x, y) Scale () Planar (in-image) rotation ()
Each individual feature votes for the closest 16 bins (2 in each dimension) to its own (x, y, )
Every peak in the histogram is considered for a possible match
Entire object’s set of features is transformed and checked in the image. If enough found, then it’s a match.
7 Ellen L. Walker
Issues of Hough-Based Alignment
Too many correspondences
Imagine 10 points from image, 5 points in model
If all are considered, we have 45 * 10 = 450 correspondences to consider!
In general N image points, M model points yields (N choose 2)*(M choose 2), or (N*(N-1)*M*(M-1))/4 correspondences to consider!
Can we limit the pairs we consider?
Accidental peaks
Just like the regular Hough transform, some peaks can be "conspiracies of coincidences"
Therefore, we must verify all "reasonably large" peaks
8 Ellen L. Walker
Parameters of Hough-based Alignment
How coarse (big) are the Hough space bins?
If too coarse, unrelated features will “conspire” to form a peak
If too fine, matching features will spread out and the peak will be lost
The finer the binning, the more time & space it takes
Multiple votes per feature provides a compromise
How many features needed to create a “vote”?
Minimum to determine necessary bin?
More cuts down time, might lose good information
9 Ellen L. Walker
More Parameters
What is the minimum # votes to align?
What is the maximum total error for success (or what is the minimum number of points, and maximum error per point)?
10 Ellen L. Walker
Alignment by Optimization
Need to use features to find the transformation that fits the features.
Least squares optimization (see 6.1.1 for details)
x is the feature vector from the database,
f is the transformation,
p is the set of parameters of the transformation,
x’ is the set of features from the image
Iterative and robust methods are also discussed in 6.1
€
E = ri
i
∑2
= f (x i; p) − ′ x i2
i
∑
11 Ellen L. Walker
Variations on Least Squares
Weighted Least Squares
In error equations, weight each point by reciprocal of its variance (estimate of uncertainty in the point’s location)
The less sure the location, the lower the weight
Iterative Methods (search) – see Optimization slides
RANSAC (Random Sample Consensus)
Choose k correspondences and compute a transformation.
Apply transformation to all correspondences, count inliers
Repeat many times. Result is transformation that yields the most inliers.
12 Ellen L. Walker
Geometric Transformations (review)
In general, a geometric transformation is any operation on points that yields points
Linear transformations can be represented by matrix multiplication of homogeneous coordinates:
Result is x’/s’ , y’/s’
t11 t12 t13
t21 t22 t23
t31 t32 t33
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
xy1
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥=x'y's'
13 Ellen L. Walker
Example transformations
Translation
Set diagonals to 1, right column to new location, all else 0
Translation adds [dx, dy, 1]t to [x, y]
Rotation
Set upper four elements to cos(theta), -sin(theta), sin(theta), cos(theta), last element to 1, all else 0
Scale
Set diagonals to 1 and lower right to 1 / scale factor
OR Set diagonals to scale factor, except lower right to 1
Projective transform
Any arbitrary 3x3 matrix!
14 Ellen L. Walker
Combining Transformations
Rotation by about an arbitrary point (xc, yc)
1. Translate so that the arbitrary point becomes 0 1 0 –xc
Temp1 = 0 1 –yc x P 0 0 1
2. Rotate by cos –sin 0
Temp2 = sin cos 0 x Temp1 0 0 1
Translate back to the original coordinates 1 0 xc
Temp3 = 0 1 yc x Temp2 0 0 1
15 Ellen L. Walker
More generally
If T1, T2, T3 are a series of matrices representing transformations, then
T3 x T2 x T1 x P performs T1, T2, then T3 on P
Order matters!
You can precompute a single transformation matrix as T = T3 x T2 x T1 , then P' = TP is the transformed point
16 Ellen L. Walker
Transformations and Invariants
Invariants are properties that are preserved through transformations
Angle between two vectors is invariant to translation, scaling and rotation (or any combination thereof)
Distance between two vectors in invariant to translation and rotation (or any combination thereof)
Angle and distance preserving transformations are called rigid transformations
These are the only logical transformations that can be performed on non-deformable physical objects.
17 Ellen L. Walker
Geometric Invariants
Given: known shape and known transformation
Use: measure that is invariant over the transformation
The value is measurable and constant over all transformed shapes
Examples Euclidean distance: invariant under translation & rotation Angle between line segments: translation, rotation, scale Cross-ratio: projective invariants (including perspective)
Note: invariants are good for locating objects, but give no transformation information for the transformations they are invariant to!
18 Ellen L. Walker
Cross Ratio: Invariant of Projection
Consider four rays “cut” by two lines
I = (A-C)(B-D) / (A-D)(B-C)
AB
A
C
B
D
C
D
19 Ellen L. Walker
Cross Ratio Examples
Two images of one object makes 2 matching cross ratios!
Dual of cross ratio: four lines from a point instead of four points on a line
Any five non-collinear but coplanar points yield two cross-ratios (from sets of 4 lines)
20 Ellen L. Walker
Using Invariants for Recognition
Measure the invariant in one image (or on the object)
Find all possible instances of the invariant (e.g. all sets of 4 collinear points) in the (other) image
If any instance of the invariant matches the measured one, then you (might) have found the object
Research question: to what extent are invariants useful in noisy images?
21 Ellen L. Walker
Calibration Problem (Alignment to World Coordinates)
Given:
Set of control points Known locations in "standard orientation" Known distances in world units, e.g. mm "Easy" to find in images
Image including all control points
Find:
Rigid transformation from "standard orientation" and world units to image orientation and pixel units
This transformation is a 3x3 matrix
22 Ellen L. Walker
Calibration Solution
The transformation from image to world can be represented as a rotation followed by a scale, then a translation
Pworld = TxSxRxPimage
This provides 2 equations per point
xworld = ximage*s*cos(theta) – yimage*s*sin(theta) + dx
yworld = ximage*s*sin(theta) + yimage*s*cos(theta)+ dy
Because we have 4 unknowns (s, theta, dx, dy), we can solve the equations given 2 points (4 values)
But, the relationship between sin(theta) and cos(theta) is nonlinear.
23 Ellen L. Walker
Getting Rotation Directly
Find the direction of the segment (P1, P2) in the image
Remember tan(theta) = (y2-y1) / (x2-x1)
Subtract the direction found from the (known) direction of the segment in "standard position"
This is theta - the rotation in the image
Fill in sin(theta) and cos(theta); now the equations are linear and the usual tools can be used to solve them.
24 Ellen L. Walker
Non-Rigid Transformations
Affine transformation has 6 independent parameters
Last row of matrix is fixed at 0 0 1
We ignore an arbitrary scale factor that can be applied
Allows shear (diagonal stretching of x and/or y axis)
At least 3 control points are needed to find the transform (3 points = 6 values)
Projective transformation has 8 independent parameters
Fix lower-right corner (overall scale) at 1
Ignore arbitrary scale factor that can be applied
Requires at least 4 control points (8 values)
25 Ellen L. Walker
Image Warping
Given an affine transformation (any 3x3 transformation)
Given an image with 3 control points specified (origin and two axis extrema)
Create a new image that maps the 3 control points to 3 corners of a pixel-aligned square
Technique:
The control points define an affine matrix
For each point in the new image, apply the transformation to find a point in the old image; copy its pixel value to the new image.
If the point is outside the borders of the old image, use a default pixel value, e.g. black
26 Ellen L. Walker
Which feature is which? (Finding correspondences)
Direct measurements can rule out some correspondences
Round hole vs. square hole
Big hole vs. small hole (relative to some other measurable distance)
Red dot vs. green dot
Invariant relationships between features can rule out others
Distance between 2 points (relative…)
Angle between segments defined by 3 points
Correspondences that cannot be ruled out must be considered (Too many correspondences?)
27 Ellen L. Walker
Structural Matching
Recast the problem as "consistent labeling"
A consistent labeling is an assignment of labels to parts that satisfies:
If Pi and Pj are related parts, then their labels f(Pi), f(Pj) are related in the same way
Example: if two segments are connected at a vertex in the model, then the respective matching segments in the image must also be connected at a vertex
28 Ellen L. Walker
Interpretation Tree
Each branch is a choice of feature-label match
Cut off branch (and all children) if a constraint is violated
(empty)
A=a A=b A=c
B=b B=c B=a B=c B=a B=b
29 Ellen L. Walker
Constraints on Correspondences (review)
Unary constraints are direct measurements
Round hole vs. square hole
Big hole vs. small hole (relative to some other measurable distance)
Red dot vs. green dot
Binary constraints are measurements between 2 features
Distance between 2 points (relative…)
Angle between segments defined by 3 points
Higher order constraints might measure relationships among 3 or more features
30 Ellen L. Walker
Searching the Interpretation Tree
Depth-first search (recursive backtracking)
Straightforward, but could be time-consuming
Heuristic (e.g. best-first) search
Requires good guesses as to which branch to expand next
(Specifics are covered in Artificial Intelligence)
Parallel Relaxation
Each node gets all labels
Every constraint removes inconsistent labels
(Review neural net slides for details)
31 Ellen L. Walker
Dealing with Large Databases
Techniques from Information Retrieval
Study of finding items in large data sets efficiently
E.g. hashing vs. brute-force search
Example “Image Retrieval Using Visual Words”
Vocabulary Construction (offline)
Database Construction (offline)
Image Retrieval (online)
32 Ellen L. Walker
Vocabulary Construction
Extract affine covariant regions from image (300k)
Shape adapted regions around feature points
Compute SIFT descriptors from each region
Determine average covariance matrix for each descriptor (tracked from frame to frame)
How does this patch change over time?
Cluster regions using K-means clustering (thousands)
Each region center becomes a ‘word’
Eliminate too-frequent ‘words’ (stop words)
33 Ellen L. Walker
Database Construction
Determine word distributions for each document (image)
Word frequency =
(number times this word occurs) / (number words in doc)
Inverse document frequency =
Log (number of documents containing this word) / (number of documents)
tf-idf measure
(word freq) * (inverse doc freq)
Each document is represented by a vector of tf-idf measures for each word
34 Ellen L. Walker
Image Retrieval
Extract regions, descriptors, and visual words
Compute tf-idf vector for the query image (or region)
Retrieve candidates with most similar tf-idf vectors
Brute force, or using an ‘inverse index’
(Optional) re-rank or verify all candidate matches (e.g. spatial consistency, validation of transformation)
(Optional) expand the result by submitting highly-ranked matches as new queries
(OK for <10k keyframes, <100k visual words)
35 Ellen L. Walker
Improvements
Vocabulary tree approach
Instead of ‘words’, create ‘vocabulary tree’
Hierarchical: each branch has several prototypes
In recognition, follow the branch with the closest prototype (recursively through the tree)
Very fast: 40k CD’s recognized in real time (30/sec); 1M frames at 1Hz (1/sec)
More sophisticated data structures
K-D Trees
Other ideas from IR
Very active research field right now
36 Ellen L. Walker
Application: Location Recognition
Match image to location where it was taken
E.g. annotating Google Maps, organizing information on Flickr, star maps
Match via vanishing points (when architectural objects are prominent)
Find landmarks (the ones everyone photographs)
Identify automatically as part of indexing process
Issues:
large number of photos
Lots of ‘clutter’ (e.g. foliage) that doesn’t help recognition
37 Ellen L. Walker
Image Retrieval
Determine the tf-idf measure for the image (using words already included in the database)
Match to the tf-idf measures for images in the DB
Similarity measured by normalized dot product (more similar = higher)
Difference measured by Euclidean distance