Pascal Grand Challenge

1

Pascal Grand Challenge

Felix Vilensky19/6/2011

2

Outline• Pascal VOC challenge framework.• Successful detection methods

o Object Detection with Discriminatively Trained Part Based Models(P.F.Felzenszwalb et al.)-”UoC/TTI” Method.

o Multiple Kernels for Object Detection (A.Vedaldi et al.)-”Oxford\MSR India” method.

• A successful classification methodo Image Classification using Super-Vector Coding of Local Image

Descriptors (Xi Zhou et al)-NEC/UIUC Method. • Discussion about bias in datasets.• 2010 Winners Overview.

3

Pascal VOC Challenge

FrameworkThe PASCAL Visual Object Classes (VOC) Challenge

Mark Everingham · Luc Van Gool ·Christopher K. I. Williams · John Winn ·Andrew Zisserman

4

Pascal VOC Challenge• Classification Task. • Detection Task.• Pixel-level segmentation.• “Person Layout” detection.• Action Classification in still images.

5

Classification Task

At least one bus

100%

6

Detection Task100%

Predicted bounding box should overlap by at least 50% with ground truth!!!

7

Detections “near misses”Didn’t fulfill the BB overlap criterion

8

Pascal VOC Challenge-The Object Classes

9

Pascal VOC Challenge-The Object Classes

Images retrieved from flicker website.

10

Pixel Level SegmentationClass segmentation

Object segmentation

Image

http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/segexamples/index.html



11

Person Layout

12

Action Classification• Classification among 9 action classes.

Speaking on the phone

Playing the guitar

100%

100%

http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/actionexamples/images/playinginstrument_01.jpg

http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/actionexamples/images/phoning_01.jpg

13

Annotation• Class.• Bounding Box.• Viewpoint.• Truncation.• Difficult (for classification\detection).

14

Annotation Example

15

Evaluation

• Precision\Recall Curves.• Interpolated Precision.• AP(Average Precision)

#True PositivesRecall# False Negatives + #True Positives

#True PositivesPrecision=# False Positives + #True Positives

A way to compare between different methods.

16

Evaluation-Precision\Recall Curves(1)

• Practical Tradeoff between precision and recall.

• Interpolated Precision-:

( ) max ( )interp r r rP r p r

Rank 1 2 3 4 5 6 7 8 9 10

g.t. Yes No Yes No Yes No No No No No

Precision

1/1 1/2 2/3 2/4 3/5 3/6 3/7 3/8 3/9 3/10

Recall 0.2 0.2 0.4 0.4 0.6 0.6 0.6 0.6 0.6 0.6

17

Evaluation-Precision\Recall Curves(2)

18

Evaluation-Average Precision(AP){0,0.1,.....,1}

1 ( )11 interp

r

AP P r

AP is for determining who’s the best.

19

Successful DetectionMethods

20

UoC/TTI Method Overview• Joint winner in 2009 Pascal VOC challenge with

the Oxford Method.• Award of "lifetime achievement“ in 2010.• Mixture of deformable part models.• Each component has global template +

deformable partso HOG feature templates.

• Fully trained from bounding boxes alone.

(P.Felzenszwalb et al.)

21

UoC/TTI Method – HOG Features(1)• [-1 0 1] and its transpose Gradient.• Gradient orientation is discretized into one of p values.

• Pixel-level features Cells of size k.• 8-pixel cells(k=8). • 9 bins contrast sensitive +18 bins contrast insensitive

=total 27 bins!

1

2

( , )Contrast sensitive: ( , ) mod2

( , )Contrast insensitive: ( , ) mod

p x yB x y round p

p x yB x y round p

( , ) if ( , )( , )

0 otherwiseb

r x y b B x yF x y

Soft binning

22

UoC/TTI Method – HOG Features(2)

…27

23

UoC/TTI Method – HOG Features(3)• Normalization.

• Truncation.• 27 bins X 4 normalization factors= 4X27 matrix.• Dimensionality Reduction to 31.

12 2 2 2 2

, ( , ) ( ( , ) ( , ) ( , ) ( , ) )

, { 1,1}

N i j C i j C i j C i j C i j

sum over bins1 1

sum over bins2 2

sum over bins3 3

sum over bins4 4

: 27 bins

: 27 bins

: 27 bins

: 27 bins

N V

N V

N V

N V

sum over NFs1 5

sum over NFs2 6

sum over NFs27 31

: 4 NFs

: 4 NFs.....

: 4 NFs

B V

B V

B V

24

UoC/TTI Method – Deformable Part Models• Coarse root.• High-Resolution deformable parts.• Part - (Anchor position, deformation cost, Res. Level)

25

UoC/TTI Method – Mixture Models(1)• Diversity of a rich object category.• Different views of the same object.

• A mixture of deformable part models for each class.• Each deformable part model in the mixture is called

a component.

26

UoC/TTI Method – Object Hypothesis

Slide taken from the methods presentation

27

UoC/TTI Method –Models(1)6 component person model

28

UoC/TTI Method –Models(2)6 component bicycle model

29

UoC/TTI Method – Score of a Hypothesis

Slide taken from method's presentation

30

UoC/TTI Method – Matching(1)

• “Sliding window approach” .• High scoring root locations define detections.

10 0,....,

( ) max ( ( ,......, ))n

np pscore p score p p

• Matching is done for each component separately.

Best part location

Root location

31

UoC/TTI Method – Matching(2)

32

UoC/TTI Method – Post Processing & Context Rescoring

Slide taken from method's presentation

33

UoC/TTI Method – Training & DM• Weakly labeled data in Training set.• Latent SVM(LSVM) training with

as latent value.• Training and Data mining in 4 stages:

0( , ,......, )cn

z c p p

Optimize z

Optimize β

Add hard negative examples

Remove easy negative examples

34

UoC/TTI Method – Results(1)

35

UoC/TTI Method – Results(2)

36

Oxford Method Overview(A.Vedaldi et al.)Regions with different

scales and aspect ratios

6 feature channels

3 level spatial pyramid

Cascade :3 SVM classifiers with 3 different kernels

Post Processing

37

Oxford Method – Feature Channels• Bag of Visual Words- SIFT descriptors are extracted and

quantized in a vocabulary of 64 words.• Dense words (PhowGray, PhowColor)- Another set of

SIFT Descriptors are then quantized in 300 visual words.• Histogram of oriented edges (Phog180, Phog360)-

Similar to the HOG descriptor used by the ”UoC/TTI” Method with 8 orientation bins.

• Self-similarity features (SSIM).

38

Oxford Method – Spatial Pyramids

39

Oxford Method – Feature Vector

Chart is taken from the methods presentation

40

Oxford Method – Discriminant Function(1)

1

( ) ( , ).M

R R ii i

i

C h y K h h

, 1,........, are the histogram collections actingas support vectors for a SVM.y { 1,1}

is a positive definite kernel.

h is the collection of normalized feature histograms {h }. f is the featu

i

i

R Rfl

h i M

K

re channel.

l is the level of the spatial pyramid.

41

Oxford Method – Discriminant Function(2)• The kernel of the discriminant function is a linear

combination of histogram kernels:

• The parameters and the weights (total 18)are learned using MKL(Multiple Kernel Learning).

• The discriminant function is used to rank candidate regions R by the likelihood of containing an instance of the object of interest.

¿

i 0fld

( , ) ( , )R i R ifl fl fl

fl

K h h d K h h

¿

42

Oxford Method – Cascade Solution(1)• Exhaustive search of the best candidate regions R , requires a

number of operations which is O(MBN):o N – The number of regions.o M – The number of support vectors in .o B – The dimensionality of the histograms.

• To reduce this complexity a cascade solution is applied. • The first stage uses a “cheap” linear kernel to evaluate .• The second uses a more expensive and powerful quasi-linear

kernel.• The Third uses the most powerful non-linear kernel. • Each stage evaluates the discriminant function on a smaller number

of candidate regions.

( )RC h

5 4 310 , 10 , 10N B M

( )RC h

43

Oxford Method – Cascade Solution(2)Type Evaluation

ComplexityLinear O(N)Quasi-Linear O( BN)Non-Linear O(MBN)

Stage 1- Linear Stage 2- Quasi-linear

Stage 3- Non linear

44

Oxford Method – Cascade Solution(3)

Chart is taken from the methods presentation

45

Oxford Method – The Kernels• All the before mentioned kernels are of the following form:

• For Linear kernels both f and g are linear. For quasi-linear kernels only f is linear.

'

1

( , ) ( , )B

ib b

b

K h h f g h h

2

:

: is a histogram bin index.

f R R

g R Rb

46

Oxford Method – Post-Processing• The output of the last stage is a ranked list of 100

candidate regions per image.• Many of these regions correspond to multiple

detections.• Non- Maxima Suppression is used.• Max 10 regions per image remain.

47

Oxford Method – Training/Retraining(1)• Jittered\flipped instances are used as positive samples.

• Training images are partitioned into two subsets.• The classifiers are tested on each subset in turn adding new

hard negative samples for retraining.

Error(Overlap <20%)

Training Data ClassifierTraining

Testing

Addition

48

Oxford Method – Results(1)

49


50


Training and testing on VOC2007.


Training on VOC2008 and testing on VOC2007.


51

Oxford Method – Summary

52

A Successful Classification

Method

53

NEC/UIUC Method Overview• A winner in the 2009 Pascal VOC classification challenge.• A framework for classification is proposed.

(Xi Zhou Kai Yu et al.)

Descriptor Coding:Super Vector

Coding

Spatial Pyramid Pooling

Classification:Linear SVM

The important part!

54

NEC/UIUC Method – Notation

Descriptor Vector.( ) Coding function.( ) Unknown function on local features.

ˆ ( ) Approximating function.Set of descriptor vectors.

XX

f X

f XY

55

NEC/UIUC Method – Descriptor Coding(1)

ˆ ( ) ( )Tf X W X

1 2[ , ,....., ]( ) is the code of X.

TKW W W W

X

Vector Quantization Coding

56

NEC/UIUC Method – Descriptor Coding(2)

ˆ ( ) ( ) ( ) TTk k

k

f X W X C X W X

1 2

1 2

[ , ,....., ]

( ) =[C ( ) ,C ( ) ,.....,C ( ) ]( ) 1 if X belongs to cluster k, otherwise C ( ) 0.

T T T TK

T T T TK

k K

W W W W

X X X X X X XC X X

Super Vector Coding

57

NEC/UIUC Method – Spatial Pooling

1X1 2X2 3X1

1

1 1( ) ( )k

C

k X Yk

Y XN p

1

1 1( ) ( )k

C

k X Yk

Y XN p

1

1 1( ) ( )k

C

k X Yk

Y XN p

N-The size of a set of local descriptors.Y-The set of local descriptors.

1 2 2 2 2 3 3 311 11 12 21 22 11 12 13( ) [ ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( )]s Y Y Y Y Y Y Y Y Y

To linear SVM classifier on

58

NEC/UIUC Method – Results(1)

• Comparison of non- linear coding methods.• Comparison with other methods.• Impact of codebook size(tested on validation set).• Images and visualization of patch- level score(using ).

SIFT128-dimentional vectors over a grid with spacing of 4 pixels on three patch levels (16x16,25x25 and 31x31).

PCAReduction of dimensionality to 80.

59

NEC/UIUC Method – Results(2)|C|=512

60

NEC/UIUC Method – Results(3)|C|=2048

61

NEC/UIUC Method – Results(4)

62

Bias in DatasetsUnbiased Look at Dataset Bias

Antonio TorralbaMassachusetts Institute of TechnologyAlexei A. EfrosCarnegie Mellon University

63

Name The Dataset• People were asked to guess, based on three images, the

dataset they were taken from. • People, who worked in the field got more than 75% correct.

64

Name The Dataset - The Dataset Classifier• 4 classifiers were trained to play the “Name The

Dataset” game.• Each classifier used different image descriptor-

o 32X32 thumbnail grayscale and color.o Gist.o Bag of HOG visual words.

• 1000 images were randomly sampled from the training portions of 12 datasets.

• The classifier was tested on 300 random images from each of the test sets repeated 20 times.

65

Name The Dataset - The Dataset Classifier• The best classifier performs at 39% (chance

is about 8%)!!!

Confusion Table

Recog. Performance vs. Number of training examples per class

66

Name The Dataset - The Dataset Classifier

Car images from different datasets

• Performance is 61% on car images from 5 different datasets (chance is 20%).

67

Cross - Dataset Generalization(1)• Training on one dataset while testing on another.• Dalal&Triggs detector(HOG detector + linear SVM) for the

detection task.• Bag of Words approach with a Gaussian kernel SVM for the

classification task.• The “car” and “person” objects are used.• Each classifier(for each dataset) was trained with 500 positive

images and 2000 negative ones.• Each detector (for each dataset) was trained with 100 positive

images and 1000 negative ones.• Testing classification with 50 positive and 1000 negative

examples.• Testing detection 10 positive and 20000 negative examples.• Each classifier\detector ran 20 times and the results averaged.

68

Cross - Dataset Generalization(2)

69

Cross - Dataset Generalization(3)Logarithmic dependency on the amount of training samples.

70

Types Of Dataset Biases• Selection Bias.• Capture Bias.• Label Bias.• Negative Set Bias-What the dataset considers

to be “the rest of the world”.

71

Negative Set Bias-Experiment(1)• Evaluation of the relative bias in the negative sets

of different datasets.• Training detectors on positives and negatives of a

single dataset. • Testing on positives from the same dataset and

on negatives from all 6 datasets combined.• The detector was trained with 100 positives and

1000 negatives.• For testing, multiple runs of 10 positive examples

for 20,000 negatives were performed.

72

Negative Set Bias-Experiment(2)

73

Negative Set Bias-Experiment(3)• A large negative train set is important for discriminating

object with similar contexts in images.

74

Dataset’s Market Value(1)• A measure of the improvement in performance

when adding training data from another dataset.

( ) is obtained when training on dataset iand testing on dataset j.

jiAP n

( ) ( / )

j jj iAP n AP n

α is the shift in number of training samples between different datasets to achieve the same average precision

75

Dataset’s Market Value(2)This table shows the sample value (“market value”) for a “car” sample across datasets.

A sample from another dataset worth more than a sample from the original dataset!!!

Bias In Datasets- Summary• Datasets, though gathered from the internet,

have distinguishable features of their own. • Methods performing well on a certain dataset can

be much worse on another.• The Negative set has at least the same

importance as the positive samples in the dataset.

• Every dataset has it own “Market Value”.

77

2010 Winners Overview

78

Pascal VOC 2010-WinnersClassificationWinner: NUSPSL_KERNELREGFUSING

Qiang Chen1, Zheng Song1, Si Liu1, Xiangyu Chen1, Xiaotong Yuan1, Tat-Seng Chua1, Shuicheng Yan1, Yang Hua2, Zhongyang Huang2, Shengmei Shen2 1National University of Singapore; 2Panasonic Singapore Laboratories

DetectionWinner: NLPR_HOGLBP_MC_LCEGCHLC

Yinan Yu, Junge Zhang, Yongzhen Huang, Shuai Zheng, Weiqiang Ren, Chong Wang, Kaiqi Huang, Tieniu Tan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences

Honourable Mentions: MITUCLA_HIERARCHY Long Zhu, Yuanhao Chen, William Freeman, Alan Yuille, Antonio Torralba MIT, UCLA

NUS_HOGLBP_CTX_CLS_RESCORE_V2 Zheng Song, Qiang Chen, Shuicheng Yan National University of Singapore

UVA_GROUPLOC/UVA_DETMONKEY Jasper Uijlings, Koen van de Sande, Theo Gevers, Arnold Smeulders, Remko Scha University of Amsterdam

79

NUS-SPL Classification Method

80

NLPR Detection Method

81

Thank You….

Documents

Pascal Grand Challenge