1
ABabcdfghiejkl Exploring Image Relationships Using Functional Maps FAN WANG 1 ,R AIF R USTAMOV 2 ,A DRIAN B UTSCHER 3 ,J USTIN S OLOMON 4 , L EONIDAS J. G UIBAS 5 Geometric Computing Group, Stanford University 1 [email protected], 2 [email protected], 3 [email protected], 4 [email protected], 5 [email protected] ABSTRACT Establishing correspondences between images is challenging, especially when point-to-point matching is hard to obtain due to large variability in object appearances. In this work, in- stead of computing point-based correspondences between im- ages, we find functional maps between them which can act as information transporters between the images. Functional maps are based on correspondences between local properties or attributes defined over images, and can be found efficiently by solving a linear system. We use functional maps to trans- fer segmentation from a set of training images to a test image, and obtain results that match or improve other state-of-the- art methods. System Framework 1 Problem Setup 1.1 Graph Construction Node: over-segmented superpixels by Normalized Cuts. Edge: the length of the shared boundary of the two super-pixels normalized by the average perimeter. Only exists for adjacent superpixels. Image descriptors can be regarded as functions on the graph f ∈F (G, R). 1.2 Reduced Functional Space Basis selection: Local bases: representing meaningful parts of a graph and capturing local distortions. Global bases: any function can be well reconstructed by only a few basis functions. Laplacian eigenfunctions as basis Φ: 1. Compact; 2. Multi-scale; 3. Geometry-aware. Each function can be represented as coefficient vector f = Φ T f . 2 Problem Formulation 2.1 Constraints on Functional Maps Probe functions: f 1 ,f 2 ,... ∈F (G i , R), g 1 ,g 2 ,... ∈F (G j , R); Coefficient vectors in Φ i and Φ j : F i =[a 1 , a 2 , ···] and F j = [b 1 , b 2 , ···]; Functional map: X ij F i = F j . We can recover C by solving the optimization problem X * ij = arg min Xij X ij F i F j F where ‖·‖ F is the Frobenius norm. Many natural relationships between images can be incorpo- rated: visual descriptors of the graph nodes, such as color, SIFT, shape descriptor, bag-of-visual-words, etc. point-to-point landmark correspondences between the t- wo images. region-to-region correspondence. We take color features and bag-of-visual-words features as probe functions 2.2 Regularization The map T commutes with Laplacian operator: L j (f T )=(L i f ) T, f ∈F (G i , R) In functional space, applying T F followed by L j yields L j (T F (f )) = Φ j T j L j Φ j )(Φ T j T F Φ i )(Φ T i f )=Φ j (L j X ij f ). The Laplacian L i followed by T F yields T F (L i (f )) = Φ j T j T F Φ i )(Φ T i L i Φ i )(Φ T i f )=Φ j (X ij L i f ). The commutation means: Φ j (L j X ij f )=Φ j (X ij L i f ), f. The regularization can be represented as X ij L i = L j X ij . 2.3 Optimization The optimization problem is formulated as: min. X ij F i F j F + λ X ij L i L j X ij F , When using Laplacian eigenfunctions as basis Φ, the regular- izer becomes simpler: min. X ij F i F j F + λ X ij Σ i Σ j X ij F , The commutativity regularizer can be further rewritten as an element-wise sum: X ij Σ i Σ j X ij 2 F = u,v x uv j (u, u) Σ i (v,v)] 2 , X ij will have significant values on the diagonal and near- diagonal entries, and almost zero everywhere else. Given a set of training images (b), their indicator functions (c), and a test image (a), we obtain a set of indicator functions (d) mapped from each image of (b), which are combined to the final indicator function (e). 3 Transferring the Segmentation Training: the ground truth segmentation is an indicator func- tion f i,gt ; Testing: the transferred segmentation is f j j X ij Φ T i f i,gt . Merging the Results Voting mechanism to combine the mapped indicator functions from N training images: g = 1 N i=1 w i N i=1 w i Φ j X ij Φ T i f i,gt , The weight w i is determined based on its distance d i to the test image using the GIST descriptor: w i = e -d 2 i /2σ 2 . 4 Experimental Results Data sets: PASCAL VOC challenge 2012 real-world consumer images from Flickr; 20 object classes; pixel-level segmented images; 4.1 Parameter Selection Performance changing with number of basis functions and regularizer weight λ: 0 20 40 60 80 100 35 40 45 50 55 Number of Basis Accuracy 10 0 10 1 10 2 10 3 30 35 40 45 50 λ Accuracy 4.2 Results on PASCAL VOC 2012 class Best Ours background 85.1 82.8 aero plane 65.4 62.8 bicycle 31.0 20.6 bird 51.3 48.4 boat 44.5 43.2 bottle 58.9 33.3 bus 60.8 61.3 car 61.5 48.1 cat 56.4 60.3 chair 22.6 14.6 cow 53.6 60.9 dining table 32.6 28.1 dog 47.4 55.4 horse 57.6 60.5 motor bike 57.9 51.7 person 51.9 35.4 potted plant 35.7 29.7 sheep 55.3 62.7 sofa 40.8 33.9 train 54.2 63.1 tv/monitor 47.8 43.2 Analysis better on natural objects, such as cat, cow, dog, horse, sheep, etc. Not sensitive to non-rigid deformations; worse on bottle, car, person if objects in test images rarely appear in training images; low accuracy on bicycle, bottle and chair because of coarse superpixels

ABabcdfghiejkl Exploring Image Relationships UsingFunctional … · 2013-04-16 · ABabcdfghiejklExploring Image Relationships Using Functional Maps FAN WANG1, RAIF RUSTAMOV2, ADRIAN

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ABabcdfghiejkl Exploring Image Relationships UsingFunctional … · 2013-04-16 · ABabcdfghiejklExploring Image Relationships Using Functional Maps FAN WANG1, RAIF RUSTAMOV2, ADRIAN

ABabcdfghiejklExploring Image Relationships Using

Functional MapsFAN WANG1, RAIF RUSTAMOV2, ADRIAN BUTSCHER3, JUSTIN SOLOMON4,

LEONIDAS J. GUIBAS5

Geometric Computing Group, Stanford University1 [email protected], [email protected], [email protected],

[email protected], [email protected]

ABSTRACT

Establishing correspondences between images is challenging,

especially when point-to-point matching is hard to obtain due

to large variability in object appearances. In this work, in-

stead of computing point-based correspondences between im-

ages, we find functional maps between them which can act

as information transporters between the images. Functional

maps are based on correspondences between local properties

or attributes defined over images, and can be found efficiently

by solving a linear system. We use functional maps to trans-

fer segmentation from a set of training images to a test image,

and obtain results that match or improve other state-of-the-

art methods.

System Framework

1 Problem Setup

1.1 Graph Construction

• Node: over-segmented superpixels by Normalized Cuts.

• Edge: the length of the shared boundary of the two

super-pixels normalized by the average perimeter. Only

exists for adjacent superpixels.

Image descriptors can be regarded as functions on the graph

f ∈ F(G,R).

1.2 Reduced Functional Space

Basis selection:

• Local bases: representing meaningful parts of a graph

and capturing local distortions.

• Global bases: any function can be well reconstructed by

only a few basis functions.

Laplacian eigenfunctions as basis Φ:

1. Compact;

2. Multi-scale;

3. Geometry-aware.

Each function can be represented as coefficient vector ~f =ΦT f .

2 Problem Formulation

2.1 Constraints on Functional Maps

Probe functions: f1, f2, . . . ∈ F(Gi,R), g1, g2, . . . ∈ F(Gj ,R);Coefficient vectors in Φi and Φj: Fi = [a1,a2, · · ·] and Fj =[b1,b2, · · ·];Functional map: Xij

~Fi = ~Fj.

We can recover C by solving the optimization problem

X∗

ij = argminXij

‖XijFi − Fj‖F

where ‖ · ‖F is the Frobenius norm.

Many natural relationships between images can be incorpo-

rated:

• visual descriptors of the graph nodes, such as color,

SIFT, shape descriptor, bag-of-visual-words, etc.

• point-to-point landmark correspondences between the t-

wo images.

• region-to-region correspondence.

We take color features and bag-of-visual-words features as

probe functions

2.2 Regularization

The map T commutes with Laplacian operator:

Lj(f ◦ T ) = (Lif) ◦ T,∀f ∈ F(Gi,R)

In functional space, applying TF followed by Lj yields

Lj (TF (f)) = Φj(ΦTj LjΦj)(Φ

Tj TFΦi)(Φ

Ti f) = Φj(LjXij

~f).

The Laplacian Li followed by TF yields

TF (Li (f)) = Φj(ΦTj TFΦi)(Φ

Ti LiΦi)(Φ

Ti f) = Φj(XijLi

~f).

The commutation means:

Φj(LjXij~f) = Φj(XijLi

~f),∀~f.

The regularization can be represented as XijLi = LjXij.

2.3 Optimization

The optimization problem is formulated as:

min. ‖XijFi − Fj‖F + λ ‖XijLi − LjXij‖F ,

When using Laplacian eigenfunctions as basis Φ, the regular-

izer becomes simpler:

min. ‖XijFi − Fj‖F + λ ‖XijΣi − ΣjXij‖F ,

The commutativity regularizer can be further rewritten as an

element-wise sum:

‖XijΣi − ΣjXij‖2

F =∑

u,v

xuv [Σj(u, u)− Σi(v, v)]2,

Xij will have significant values on the diagonal and near-

diagonal entries, and almost zero everywhere else.

Given a set of training images (b), their indicator functions (c),

and a test image (a), we obtain a set of indicator functions (d)

mapped from each image of (b), which are combined to the

final indicator function (e).

3 Transferring the Segmentation

Training: the ground truth segmentation is an indicator func-

tion fi,gt ;

Testing: the transferred segmentation is fj = ΦjXijΦTi fi,gt .

Merging the Results

Voting mechanism to combine the mapped indicator functions

from N training images:

g =1

∑Ni=1

wi

N∑

i=1

wiΦjXijΦTi fi,gt ,

The weight wi is determined based on its distance di to the

test image using the GIST descriptor:

wi = e−d2i /2σ2

.

4 Experimental Results

Data sets: PASCAL VOC challenge 2012

• real-world consumer images from Flickr;

• 20 object classes;

• pixel-level segmented images;

4.1 Parameter Selection

Performance changing with number of basis functions and

regularizer weight λ:

0 20 40 60 80 10035

40

45

50

55

Number of Basis

Acc

urac

y

100

101

102

103

30

35

40

45

50

λ

Acc

urac

y4.2 Results on PASCAL VOC 2012

class Best Ours

background 85.1 82.8

aero plane 65.4 62.8

bicycle 31.0 20.6

bird 51.3 48.4

boat 44.5 43.2

bottle 58.9 33.3

bus 60.8 61.3

car 61.5 48.1

cat 56.4 60.3

chair 22.6 14.6

cow 53.6 60.9

dining table 32.6 28.1

dog 47.4 55.4

horse 57.6 60.5

motor bike 57.9 51.7

person 51.9 35.4

potted plant 35.7 29.7

sheep 55.3 62.7

sofa 40.8 33.9

train 54.2 63.1

tv/monitor 47.8 43.2

Analysis

• better on natural objects, such as cat, cow, dog, horse,

sheep, etc. Not sensitive to non-rigid deformations;

• worse on bottle, car, person if objects in test images

rarely appear in training images;

• low accuracy on bicycle, bottle and chair because of

coarse superpixels