Upload
nusa
View
46
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Pascal Grand Challenge. Felix Vilensky 19/6/2011. Outline. Pascal VOC c hallenge framework. Successful detection methods Object Detection with Discriminatively Trained Part Based Models (P.F.Felzenszwalb et al.)-”UoC/TTI” Method. Multiple Kernels for Object Detection (A.Vedaldi et al.)- - PowerPoint PPT Presentation
Citation preview
1
Pascal Grand Challenge
Felix Vilensky19/6/2011
2
Outline• Pascal VOC challenge framework.• Successful detection methods
o Object Detection with Discriminatively Trained Part Based Models(P.F.Felzenszwalb et al.)-”UoC/TTI” Method.
o Multiple Kernels for Object Detection (A.Vedaldi et al.)-”Oxford\MSR India” method.
• A successful classification methodo Image Classification using Super-Vector Coding of Local Image
Descriptors (Xi Zhou et al)-NEC/UIUC Method. • Discussion about bias in datasets.• 2010 Winners Overview.
3
Pascal VOC Challenge
FrameworkThe PASCAL Visual Object Classes (VOC) Challenge
Mark Everingham · Luc Van Gool ·Christopher K. I. Williams · John Winn ·Andrew Zisserman
4
Pascal VOC Challenge• Classification Task. • Detection Task.• Pixel-level segmentation.• “Person Layout” detection.• Action Classification in still images.
5
Classification Task
At least one bus
100%
6
Detection Task100%
Predicted bounding box should overlap by at least 50% with ground truth!!!
7
Detections “near misses”Didn’t fulfill the BB overlap criterion
8
Pascal VOC Challenge-The Object Classes
9
Pascal VOC Challenge-The Object Classes
Images retrieved from flicker website.
10
Pixel Level SegmentationClass segmentation
Object segmentation
Image
11
Person Layout
12
Action Classification• Classification among 9 action classes.
Speaking on the phone
Playing the guitar
100%
100%
13
Annotation• Class.• Bounding Box.• Viewpoint.• Truncation.• Difficult (for classification\detection).
14
Annotation Example
15
Evaluation
• Precision\Recall Curves.• Interpolated Precision.• AP(Average Precision)
#True PositivesRecall# False Negatives + #True Positives
#True PositivesPrecision=# False Positives + #True Positives
A way to compare between different methods.
16
Evaluation-Precision\Recall Curves(1)
• Practical Tradeoff between precision and recall.
• Interpolated Precision-:
( ) max ( )interp r r rP r p r
Rank 1 2 3 4 5 6 7 8 9 10
g.t. Yes No Yes No Yes No No No No No
Precision
1/1 1/2 2/3 2/4 3/5 3/6 3/7 3/8 3/9 3/10
Recall 0.2 0.2 0.4 0.4 0.6 0.6 0.6 0.6 0.6 0.6
17
Evaluation-Precision\Recall Curves(2)
18
Evaluation-Average Precision(AP){0,0.1,.....,1}
1 ( )11 interp
r
AP P r
AP is for determining who’s the best.
19
Successful DetectionMethods
20
UoC/TTI Method Overview• Joint winner in 2009 Pascal VOC challenge with
the Oxford Method.• Award of "lifetime achievement“ in 2010.• Mixture of deformable part models.• Each component has global template +
deformable partso HOG feature templates.
• Fully trained from bounding boxes alone.
(P.Felzenszwalb et al.)
21
UoC/TTI Method – HOG Features(1)• [-1 0 1] and its transpose Gradient.• Gradient orientation is discretized into one of p values.
• Pixel-level features Cells of size k.• 8-pixel cells(k=8). • 9 bins contrast sensitive +18 bins contrast insensitive
=total 27 bins!
1
2
( , )Contrast sensitive: ( , ) mod2
( , )Contrast insensitive: ( , ) mod
p x yB x y round p
p x yB x y round p
( , ) if ( , )( , )
0 otherwiseb
r x y b B x yF x y
Soft binning
22
UoC/TTI Method – HOG Features(2)
…27
23
UoC/TTI Method – HOG Features(3)• Normalization.
• Truncation.• 27 bins X 4 normalization factors= 4X27 matrix.• Dimensionality Reduction to 31.
12 2 2 2 2
, ( , ) ( ( , ) ( , ) ( , ) ( , ) )
, { 1,1}
N i j C i j C i j C i j C i j
sum over bins1 1
sum over bins2 2
sum over bins3 3
sum over bins4 4
: 27 bins
: 27 bins
: 27 bins
: 27 bins
N V
N V
N V
N V
sum over NFs1 5
sum over NFs2 6
sum over NFs27 31
: 4 NFs
: 4 NFs.....
: 4 NFs
B V
B V
B V
24
UoC/TTI Method – Deformable Part Models• Coarse root.• High-Resolution deformable parts.• Part - (Anchor position, deformation cost, Res. Level)
25
UoC/TTI Method – Mixture Models(1)• Diversity of a rich object category.• Different views of the same object.
• A mixture of deformable part models for each class.• Each deformable part model in the mixture is called
a component.
26
UoC/TTI Method – Object Hypothesis
Slide taken from the methods presentation
27
UoC/TTI Method –Models(1)6 component person model
28
UoC/TTI Method –Models(2)6 component bicycle model
29
UoC/TTI Method – Score of a Hypothesis
Slide taken from method's presentation
30
UoC/TTI Method – Matching(1)
• “Sliding window approach” .• High scoring root locations define detections.
10 0,....,
( ) max ( ( ,......, ))n
np pscore p score p p
• Matching is done for each component separately.
Best part location
Root location
31
UoC/TTI Method – Matching(2)
32
UoC/TTI Method – Post Processing & Context Rescoring
Slide taken from method's presentation
33
UoC/TTI Method – Training & DM• Weakly labeled data in Training set.• Latent SVM(LSVM) training with
as latent value.• Training and Data mining in 4 stages:
0( , ,......, )cn
z c p p
Optimize z
Optimize β
Add hard negative examples
Remove easy negative examples
34
UoC/TTI Method – Results(1)
35
UoC/TTI Method – Results(2)
36
Oxford Method Overview(A.Vedaldi et al.)Regions with different
scales and aspect ratios
6 feature channels
3 level spatial pyramid
Cascade :3 SVM classifiers with 3 different kernels
Post Processing
37
Oxford Method – Feature Channels• Bag of Visual Words- SIFT descriptors are extracted and
quantized in a vocabulary of 64 words.• Dense words (PhowGray, PhowColor)- Another set of
SIFT Descriptors are then quantized in 300 visual words.• Histogram of oriented edges (Phog180, Phog360)-
Similar to the HOG descriptor used by the ”UoC/TTI” Method with 8 orientation bins.
• Self-similarity features (SSIM).
38
Oxford Method – Spatial Pyramids
39
Oxford Method – Feature Vector
Chart is taken from the methods presentation
40
Oxford Method – Discriminant Function(1)
1
( ) ( , ).M
R R ii i
i
C h y K h h
, 1,........, are the histogram collections actingas support vectors for a SVM.y { 1,1}
is a positive definite kernel.
h is the collection of normalized feature histograms {h }. f is the featu
i
i
R Rfl
h i M
K
re channel.
l is the level of the spatial pyramid.
41
Oxford Method – Discriminant Function(2)• The kernel of the discriminant function is a linear
combination of histogram kernels:
• The parameters and the weights (total 18)are learned using MKL(Multiple Kernel Learning).
• The discriminant function is used to rank candidate regions R by the likelihood of containing an instance of the object of interest.
¿
i 0fld
( , ) ( , )R i R ifl fl fl
fl
K h h d K h h
¿
42
Oxford Method – Cascade Solution(1)• Exhaustive search of the best candidate regions R , requires a
number of operations which is O(MBN):o N – The number of regions.o M – The number of support vectors in .o B – The dimensionality of the histograms.
• To reduce this complexity a cascade solution is applied. • The first stage uses a “cheap” linear kernel to evaluate .• The second uses a more expensive and powerful quasi-linear
kernel.• The Third uses the most powerful non-linear kernel. • Each stage evaluates the discriminant function on a smaller number
of candidate regions.
( )RC h
5 4 310 , 10 , 10N B M
( )RC h
43
Oxford Method – Cascade Solution(2)Type Evaluation
ComplexityLinear O(N)Quasi-Linear O( BN)Non-Linear O(MBN)
Stage 1- Linear Stage 2- Quasi-linear
Stage 3- Non linear
44
Oxford Method – Cascade Solution(3)
Chart is taken from the methods presentation
45
Oxford Method – The Kernels• All the before mentioned kernels are of the following form:
• For Linear kernels both f and g are linear. For quasi-linear kernels only f is linear.
'
1
( , ) ( , )B
ib b
b
K h h f g h h
2
:
: is a histogram bin index.
f R R
g R Rb
46
Oxford Method – Post-Processing• The output of the last stage is a ranked list of 100
candidate regions per image.• Many of these regions correspond to multiple
detections.• Non- Maxima Suppression is used.• Max 10 regions per image remain.
47
Oxford Method – Training/Retraining(1)• Jittered\flipped instances are used as positive samples.
• Training images are partitioned into two subsets.• The classifiers are tested on each subset in turn adding new
hard negative samples for retraining.
Error(Overlap <20%)
Training Data ClassifierTraining
Testing
Addition
48
Oxford Method – Results(1)
49
Oxford Method – Results(2)
50
Oxford Method – Results(3)
Training and testing on VOC2007.
Training and testing on VOC2008.
Training on VOC2008 and testing on VOC2007.
Training and testing on VOC2009.
51
Oxford Method – Summary
52
A Successful Classification
Method
53
NEC/UIUC Method Overview• A winner in the 2009 Pascal VOC classification challenge.• A framework for classification is proposed.
(Xi Zhou Kai Yu et al.)
Descriptor Coding:Super Vector
Coding
Spatial Pyramid Pooling
Classification:Linear SVM
The important part!
54
NEC/UIUC Method – Notation
Descriptor Vector.( ) Coding function.( ) Unknown function on local features.
ˆ ( ) Approximating function.Set of descriptor vectors.
XX
f X
f XY
55
NEC/UIUC Method – Descriptor Coding(1)
ˆ ( ) ( )Tf X W X
1 2[ , ,....., ]( ) is the code of X.
TKW W W W
X
Vector Quantization Coding
56
NEC/UIUC Method – Descriptor Coding(2)
ˆ ( ) ( ) ( ) TTk k
k
f X W X C X W X
1 2
1 2
[ , ,....., ]
( ) =[C ( ) ,C ( ) ,.....,C ( ) ]( ) 1 if X belongs to cluster k, otherwise C ( ) 0.
T T T TK
T T T TK
k K
W W W W
X X X X X X XC X X
Super Vector Coding
57
NEC/UIUC Method – Spatial Pooling
1X1 2X2 3X1
1
1 1( ) ( )k
C
k X Yk
Y XN p
1
1 1( ) ( )k
C
k X Yk
Y XN p
1
1 1( ) ( )k
C
k X Yk
Y XN p
N-The size of a set of local descriptors.Y-The set of local descriptors.
1 2 2 2 2 3 3 311 11 12 21 22 11 12 13( ) [ ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( )]s Y Y Y Y Y Y Y Y Y
To linear SVM classifier on
58
NEC/UIUC Method – Results(1)
• Comparison of non- linear coding methods.• Comparison with other methods.• Impact of codebook size(tested on validation set).• Images and visualization of patch- level score(using ).
SIFT128-dimentional vectors over a grid with spacing of 4 pixels on three patch levels (16x16,25x25 and 31x31).
PCAReduction of dimensionality to 80.
59
NEC/UIUC Method – Results(2)|C|=512
60
NEC/UIUC Method – Results(3)|C|=2048
61
NEC/UIUC Method – Results(4)
62
Bias in DatasetsUnbiased Look at Dataset Bias
Antonio TorralbaMassachusetts Institute of TechnologyAlexei A. EfrosCarnegie Mellon University
63
Name The Dataset• People were asked to guess, based on three images, the
dataset they were taken from. • People, who worked in the field got more than 75% correct.
64
Name The Dataset - The Dataset Classifier• 4 classifiers were trained to play the “Name The
Dataset” game.• Each classifier used different image descriptor-
o 32X32 thumbnail grayscale and color.o Gist.o Bag of HOG visual words.
• 1000 images were randomly sampled from the training portions of 12 datasets.
• The classifier was tested on 300 random images from each of the test sets repeated 20 times.
65
Name The Dataset - The Dataset Classifier• The best classifier performs at 39% (chance
is about 8%)!!!
Confusion Table
Recog. Performance vs. Number of training examples per class
66
Name The Dataset - The Dataset Classifier
Car images from different datasets
• Performance is 61% on car images from 5 different datasets (chance is 20%).
67
Cross - Dataset Generalization(1)• Training on one dataset while testing on another.• Dalal&Triggs detector(HOG detector + linear SVM) for the
detection task.• Bag of Words approach with a Gaussian kernel SVM for the
classification task.• The “car” and “person” objects are used.• Each classifier(for each dataset) was trained with 500 positive
images and 2000 negative ones.• Each detector (for each dataset) was trained with 100 positive
images and 1000 negative ones.• Testing classification with 50 positive and 1000 negative
examples.• Testing detection 10 positive and 20000 negative examples.• Each classifier\detector ran 20 times and the results averaged.
68
Cross - Dataset Generalization(2)
69
Cross - Dataset Generalization(3)Logarithmic dependency on the amount of training samples.
70
Types Of Dataset Biases• Selection Bias.• Capture Bias.• Label Bias.• Negative Set Bias-What the dataset considers
to be “the rest of the world”.
71
Negative Set Bias-Experiment(1)• Evaluation of the relative bias in the negative sets
of different datasets.• Training detectors on positives and negatives of a
single dataset. • Testing on positives from the same dataset and
on negatives from all 6 datasets combined.• The detector was trained with 100 positives and
1000 negatives.• For testing, multiple runs of 10 positive examples
for 20,000 negatives were performed.
72
Negative Set Bias-Experiment(2)
73
Negative Set Bias-Experiment(3)• A large negative train set is important for discriminating
object with similar contexts in images.
74
Dataset’s Market Value(1)• A measure of the improvement in performance
when adding training data from another dataset.
( ) is obtained when training on dataset iand testing on dataset j.
jiAP n
( ) ( / )
j jj iAP n AP n
α is the shift in number of training samples between different datasets to achieve the same average precision
75
Dataset’s Market Value(2)This table shows the sample value (“market value”) for a “car” sample across datasets.
A sample from another dataset worth more than a sample from the original dataset!!!
Bias In Datasets- Summary• Datasets, though gathered from the internet,
have distinguishable features of their own. • Methods performing well on a certain dataset can
be much worse on another.• The Negative set has at least the same
importance as the positive samples in the dataset.
• Every dataset has it own “Market Value”.
77
2010 Winners Overview
78
Pascal VOC 2010-WinnersClassificationWinner: NUSPSL_KERNELREGFUSING
Qiang Chen1, Zheng Song1, Si Liu1, Xiangyu Chen1, Xiaotong Yuan1, Tat-Seng Chua1, Shuicheng Yan1, Yang Hua2, Zhongyang Huang2, Shengmei Shen2 1National University of Singapore; 2Panasonic Singapore Laboratories
DetectionWinner: NLPR_HOGLBP_MC_LCEGCHLC
Yinan Yu, Junge Zhang, Yongzhen Huang, Shuai Zheng, Weiqiang Ren, Chong Wang, Kaiqi Huang, Tieniu Tan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
Honourable Mentions: MITUCLA_HIERARCHY Long Zhu, Yuanhao Chen, William Freeman, Alan Yuille, Antonio Torralba MIT, UCLA
NUS_HOGLBP_CTX_CLS_RESCORE_V2 Zheng Song, Qiang Chen, Shuicheng Yan National University of Singapore
UVA_GROUPLOC/UVA_DETMONKEY Jasper Uijlings, Koen van de Sande, Theo Gevers, Arnold Smeulders, Remko Scha University of Amsterdam
79
NUS-SPL Classification Method
80
NLPR Detection Method
81
Thank You….