View
218
Download
0
Category
Tags:
Preview:
Citation preview
Does one size really fit all? Evaluating classifiers in Bag-of-Visual-Words classification
Christian Hentschel, Harald Sack
Hasso Plattner Institute
Agenda
1. Content-based Image Classification – Motivation
2. Bag-of-Visual-Words
3. Bag-of-Visual-Words Classification
■ Classifier Evaluation
■ Model Visualization
4. Conclusion
Does one size really fit all?
Content-based Image Classification
Christian Hentschel, 09-18-2014
Chart 3
Does one size really fit all?
Training:
■ Positive images:
(that depict a concept)
■ Negative images:
(that don’t)
Classification:
■ Test image if it depicts concept
(or not):
Content-based Image Classification (2)
Christian Hentschel, 09-18-2014
Chart 4
Does one size really fit all?
■ Origin - text classification
□ e.g. Task: classify forum posts into “insult” (positive) and “not insult” (negative)
Bag-of-Visual-Words
Christian Hentschel, 09-18-2014
Chart 5
"haha...at least get your insults straight you idiot!!...."
"You're one of my favorite commenters."
{ “idiot”: 1, “favorite”: 2, “to”: 3, “you”: 4, “at”: 5, “least”: 6, “commenter”: 7, …}
[1, 2, 1, 1, 2, 0, 0,…]
[1, 1, 1, 1, 0, 1, 1,…]
D1 D2
D1
D2
Does one size really fit all?
■ Learn a decision rule (e.g. linear SVM)
□ i.e. learn features weights
Bag-of-Visual-Words (2)
Christian Hentschel, 09-18-2014
Chart 6[Adopted from A. Mueller,https://github.com/amueller/ml-berlin-tutorial]
Featu
re w
eig
hts
Does one size really fit all?
■ Examples for Visual Words
Bag-of-Visual-Words (3)
Christian Hentschel, 09-18-2014
Chart 7[Schmid, 2013]
Does one size really fit all?
■ De-facto standard: kernel-based Support Vector Machines
□ Decision rule:
□ Kernel-Function:
□ Distance metric:
Bag-of-Visual Words Classification
Christian Hentschel, 09-18-2014
Chart 9
Does one size really fit all?
■ Testing different classification models
□ Average Precision (AP, area under Precision Recall Curve)
■ Test Dataset
□ Caltech-101
– 100 + 1 object classes
– 31 – 800 images per class
■ Tested Classifiers:
□ Naïve Bayes, K-NN, Logistic Regression
□ SVM: linear SVM, RBF kernel SVM, Chi2-kernel SVM
□ Ensemble Methods:Random Forest, AdaBoost
□ Hyper parameters optimized in grid-search using CV
Bag-of-Visual Words Classification (2)
Christian Hentschel, 09-18-2014
Chart 10
Does one size really fit all?
■ Mean AP scores over all classes:
Bag-of-Visual Words Classification – Results
Christian Hentschel, 09-18-2014
Chart 12Naive Bayes
k NN
Logistic Regression
linear SVM
RBF kernel SVM
Random Forest
AdaBoost
Chi2-Kernel SVM
0.48
0.52
0.55
0.55
0.59
0.61
0.63
0.67
Does one size really fit all?
■ mAP-scores between best (Chi2-SVM) and worst (Naïve Bayes): 0.19
□ Poor performance of Naïve Bayes and k-NN – but fast training
■ Superior performance of kernel-based SVM, but:
□ Kernel function (Chi2 vs. Gaussian RBF) is crucial:
– Ensemble methods outperform Gaussian RBF
– Gaussian RBF only slightly better than linear SVM
□ increased evaluation time:
– complex kernel function between each SV and a testing example
– ensemble method reduce classification time
Bag-of-Visual Words Classification – Results (2)
Christian Hentschel, 09-18-2014
Chart 13
Does one size really fit all?
■ Correlation between training sets size and average Precision:
Bag-of-Visual Words Classification – Results (3)
Christian Hentschel, 09-18-2014
Chart 14
Does one size really fit all?
■ Outliers:
□ “minaret”
□ “leopards”
Bag-of-Visual Words Classification – Results (4)
Christian Hentschel, 09-18-2014
Chart 15
Does one size really fit all?
■ Visualize impact of individual image regions on classification result
□ Use ensemble methods
– No kernel function
– AdaBoost:direct indicator for feature importance: mean decrease in impurity
Bag-of-Visual Words Classification –Model Visualization
Christian Hentschel, 09-18-2014
Chart 16
Local Region
Descriptor
BoVW Vector
Feature Weights
Does one size really fit all?
■ Kernel-based SVM are best choice when aiming for accuracy
□ Kernel function is crucial
□ Evaluation time-cost is high
■ Ensemble methods are second-best winner
□ Fast evaluation
□ Offer intuitive visualization of model parameters
■ Visual analytics reveal deficiencies in datasets
□ Improperly chosen training data affects classification results
Conclusion
Christian Hentschel, 09-18-2014
Chart 21
Recommended