Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ABabcdfghiejklExploring Image Relationships Using
Functional MapsFAN WANG1, RAIF RUSTAMOV2, ADRIAN BUTSCHER3, JUSTIN SOLOMON4,
LEONIDAS J. GUIBAS5
Geometric Computing Group, Stanford University1 [email protected], [email protected], [email protected],
[email protected], [email protected]
ABSTRACT
Establishing correspondences between images is challenging,
especially when point-to-point matching is hard to obtain due
to large variability in object appearances. In this work, in-
stead of computing point-based correspondences between im-
ages, we find functional maps between them which can act
as information transporters between the images. Functional
maps are based on correspondences between local properties
or attributes defined over images, and can be found efficiently
by solving a linear system. We use functional maps to trans-
fer segmentation from a set of training images to a test image,
and obtain results that match or improve other state-of-the-
art methods.
System Framework
1 Problem Setup
1.1 Graph Construction
• Node: over-segmented superpixels by Normalized Cuts.
• Edge: the length of the shared boundary of the two
super-pixels normalized by the average perimeter. Only
exists for adjacent superpixels.
Image descriptors can be regarded as functions on the graph
f ∈ F(G,R).
1.2 Reduced Functional Space
Basis selection:
• Local bases: representing meaningful parts of a graph
and capturing local distortions.
• Global bases: any function can be well reconstructed by
only a few basis functions.
Laplacian eigenfunctions as basis Φ:
1. Compact;
2. Multi-scale;
3. Geometry-aware.
Each function can be represented as coefficient vector ~f =ΦT f .
2 Problem Formulation
2.1 Constraints on Functional Maps
Probe functions: f1, f2, . . . ∈ F(Gi,R), g1, g2, . . . ∈ F(Gj ,R);Coefficient vectors in Φi and Φj: Fi = [a1,a2, · · ·] and Fj =[b1,b2, · · ·];Functional map: Xij
~Fi = ~Fj.
We can recover C by solving the optimization problem
X∗
ij = argminXij
‖XijFi − Fj‖F
where ‖ · ‖F is the Frobenius norm.
Many natural relationships between images can be incorpo-
rated:
• visual descriptors of the graph nodes, such as color,
SIFT, shape descriptor, bag-of-visual-words, etc.
• point-to-point landmark correspondences between the t-
wo images.
• region-to-region correspondence.
We take color features and bag-of-visual-words features as
probe functions
2.2 Regularization
The map T commutes with Laplacian operator:
Lj(f ◦ T ) = (Lif) ◦ T,∀f ∈ F(Gi,R)
In functional space, applying TF followed by Lj yields
Lj (TF (f)) = Φj(ΦTj LjΦj)(Φ
Tj TFΦi)(Φ
Ti f) = Φj(LjXij
~f).
The Laplacian Li followed by TF yields
TF (Li (f)) = Φj(ΦTj TFΦi)(Φ
Ti LiΦi)(Φ
Ti f) = Φj(XijLi
~f).
The commutation means:
Φj(LjXij~f) = Φj(XijLi
~f),∀~f.
The regularization can be represented as XijLi = LjXij.
2.3 Optimization
The optimization problem is formulated as:
min. ‖XijFi − Fj‖F + λ ‖XijLi − LjXij‖F ,
When using Laplacian eigenfunctions as basis Φ, the regular-
izer becomes simpler:
min. ‖XijFi − Fj‖F + λ ‖XijΣi − ΣjXij‖F ,
The commutativity regularizer can be further rewritten as an
element-wise sum:
‖XijΣi − ΣjXij‖2
F =∑
u,v
xuv [Σj(u, u)− Σi(v, v)]2,
Xij will have significant values on the diagonal and near-
diagonal entries, and almost zero everywhere else.
Given a set of training images (b), their indicator functions (c),
and a test image (a), we obtain a set of indicator functions (d)
mapped from each image of (b), which are combined to the
final indicator function (e).
3 Transferring the Segmentation
Training: the ground truth segmentation is an indicator func-
tion fi,gt ;
Testing: the transferred segmentation is fj = ΦjXijΦTi fi,gt .
Merging the Results
Voting mechanism to combine the mapped indicator functions
from N training images:
g =1
∑Ni=1
wi
N∑
i=1
wiΦjXijΦTi fi,gt ,
The weight wi is determined based on its distance di to the
test image using the GIST descriptor:
wi = e−d2i /2σ2
.
4 Experimental Results
Data sets: PASCAL VOC challenge 2012
• real-world consumer images from Flickr;
• 20 object classes;
• pixel-level segmented images;
4.1 Parameter Selection
Performance changing with number of basis functions and
regularizer weight λ:
0 20 40 60 80 10035
40
45
50
55
Number of Basis
Acc
urac
y
100
101
102
103
30
35
40
45
50
λ
Acc
urac
y4.2 Results on PASCAL VOC 2012
class Best Ours
background 85.1 82.8
aero plane 65.4 62.8
bicycle 31.0 20.6
bird 51.3 48.4
boat 44.5 43.2
bottle 58.9 33.3
bus 60.8 61.3
car 61.5 48.1
cat 56.4 60.3
chair 22.6 14.6
cow 53.6 60.9
dining table 32.6 28.1
dog 47.4 55.4
horse 57.6 60.5
motor bike 57.9 51.7
person 51.9 35.4
potted plant 35.7 29.7
sheep 55.3 62.7
sofa 40.8 33.9
train 54.2 63.1
tv/monitor 47.8 43.2
Analysis
• better on natural objects, such as cat, cow, dog, horse,
sheep, etc. Not sensitive to non-rigid deformations;
• worse on bottle, car, person if objects in test images
rarely appear in training images;
• low accuracy on bicycle, bottle and chair because of
coarse superpixels