QUESTION 1
HOUGH TRANSFORMTransform a point in a cartesian coordinate toall lines (line parameters) that pass the point in the polar coordinate
H : (x, y) → {(r, θ)|r(θ) = x cos θ + y sin θ}
y = + x + → r(θ) = x cos θ + y sin θcos θsin θ
rsin θ
GENERALIZED HOUGH TRANSFORMfeature to points in the 2D Hough space
is the reference origin of the shapeFor points on the boundary, store : R-Table
As a function of the gradient Rotation and scale : 4D Hough Space
Φ(x) S = { }a⃗ a⃗
x ⃗ = +r ⃗ a⃗ x ⃗ Φ( )x ⃗
D.H. Ballard, "Generalizing the Hough Transform to Detect Arbitrary Shapes", Pattern Recognition, Vol.13, No.2, p.111-122, 1981
IMPLICIT SHAPE MODELR-Table as a function of an image patchtransform an image patch to object centers
B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV 2004
QUESTION 2
BAG OF VISUAL WORDS
Feature Extraction
Clustering
Encoding
Spatial Binning
Kernel SVM k(x, y) = ⟨Ψ(x),Ψ(y)⟩
BAG OF VISUAL WORDSFeature Extraction
Local image featureRobust to typical image transformations
Dense SIFT SIFT at every location vs a key point
Interest points might not correspond toforeground
Clustering (Dictionary)Visual words should be distinctive anddiverseCommon visual words (wheel) will form acluster
K-means clustering
BAG OF VISUAL WORDSEncoding
Maps local features to visual wordsHard quantization
assign to the NN
Spatial PyramidEncode spatial informationDivide the image into subsectionsFor each subsection, create aVW histogram
SIMILARITY METRIC FOR DISTRIBUTIONSTrain a linear SVM on features (distribution): Distance metric??
How similar is a distribution from a distribution : -divergence!P Q f
, convexf : (0,7) →
(P||Q) = ∫ f ( ) dQDfdPdQ
Kullback-Leibler Divergence
Divergencef (x) = x log x
χ2
f (x) = (x + 1)2
FOR A DISCRETE CASEIntersection kernel :
kernel :
k(x, y) = *min( , )xi yi
χ2 k(x, y) = *12
( +xi yi)2
+xi yi
In the problem 2, we will use kernel for the similarity metricχ2
HOMOGENEOUS KERNEL AND FINITE APPROXIMATION expensive, non linear!
But in the feature space it becomes linear ( dimension)K(x, y)
Ψ(Þ) 7
Feature map using the Homogeneous KernelGeneric histogram kernel is
Homogeneous if intersection, , Hellinger's, Jensen-Shannon's
Find a finite feature map that
Map distributions using then it becomes a linear classificationproblem in the feature space!
Ψ(x)K(x, y) = k( , )*i xi yi
k(cx, cy) = ck(x, y)χ2
(Þ)Ψ̂ k(x, y) = ⟨Ψ(x),Ψ(y)⟩ a ⟨ (x), (y)⟩Ψ̂ Ψ̂
(Þ)Ψ̂
HOMOGENEOUS KERNELUse the homogeneity,
Ex, intersection kernel
Define as the Fourier transformation of
Continuous, infinite dimensional featureSample at points
In the problem 2, we use vl_homkermap()
k(x, y) = k ( , ) = (log(y/x))xy‾‾√ x/y‾ ‾‾3 y/x‾ ‾‾3 xy‾‾√
k(x, y) = min(x, y) = (log )xy‾‾√yx
(w) = e+|w|/2
κ(λ) (w)Ψ(x) = e+iλ log x xκ(λ)‾ ‾‾‾‾3
λ = +nL, (+n + 1)L, . . . , nL(x) !Ψ̂ 2n+1
k(x, y) a ⟨ (x), (y)⟩Ψ̂ Ψ̂ n = 1
BAG OF VISUAL WORDS
Feature Extraction
Clustering
Encoding
Spatial Binning
Kernel SVM k(x, y) = ⟨Ψ(x),Ψ(y)⟩
BAG OF VISUAL WORDSInstall VL Feat, an extensive CV library
Set up the path correctly in the starter code p2.m
Vary options and see how it affects the performance
http://www.vlfeat.org/
Code extract_dense_sift.m
Image feature extraction
Dense SIFT features
use vl_dsift()
randomly select 10,000 features
Code create_dictionary.m
Create dictionary of visual words
Use k-means to find the centroid of clusters.
use vl_kmeans()
Code create_histograms.m
Represent images as histograms
Spatial pyramid