5/30/2006EE 148, Spring 20061 Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray

5/30/2006 EE 148, Spring 2006 1

Visual Categorization with Bags of Keypoints

Gabriella Csurka Christopher R. Dance

Lixin Fan Jutta Willamowski

Cedric Bray

Presented by Yun-hsueh Liu

5/30/2006 EE 148, Spring 2006 2

What is Generic Visual Categorization?

Categorization: distinguish different classes

Generic Visual Categorization: Generic to cope with many object types simultaneously readily extended to new object types. Handle the variation in view, imaging, lighting, occlusion, and

typical object and scene variations

5/30/2006 EE 148, Spring 2006 3

Previous Work in Computational Vision

Single Category Detection

Decide if a member of one visual category is present in a given image. (faces, cars, targets)

Content Based Image Retrieval Retrieve images on the basis of low-level image

features, such as colors or textures.

Recognition Distinguish between images of structurally distinct

objects within one class. (say, different cell phones)

5/30/2006 EE 148, Spring 2006 4

Bag-of-Keypoints Approach

Interesting Point Detection

Key PatchExtraction

FeatureDescriptors

Bag of KeypointsMulti-classClassifier

5.1

.

.

.

5.0

1.0

5/30/2006 EE 148, Spring 2006 5

SIFT Descriptors


Key PatchExtraction

FeatureDescriptors


5.1

.

.

.

5.0

1.0

5/30/2006 EE 148, Spring 2006 6

Bag of Keypoints (1)

Construction of a vocabulary Kmeans clustering find “centroids”

(on all the descriptors we find from all the training images) Define a “vocabulary” as a set of “centroids”, where every centroid

represents a “word”.


Key PatchExtraction

FeatureDescriptors


5/30/2006 EE 148, Spring 2006 7

Bag of Keypoints (2)

Histogram Counts the number of occurrences of different visual words in each

image


Key PatchExtraction

FeatureDescriptors


5/30/2006 EE 148, Spring 2006 8

Multi-class Classifier

In this paper, classification is based on conventional machine learning approaches Naïve Bayes Support Vector Machine (SVM)


Key PatchExtraction

FeatureDescriptors


5/30/2006 EE 148, Spring 2006 9

Multi-class classifier –Naïve Bayes (1)

Let V = {vi}, i = 1,…,N, be a visual vocabulary, in which each vi represents a visual word (cluster centers) from the feature space.

A set of labeled images I = {Ii } .

Denote Cj to represent our Classes, where j = 1,..,M

N(t,i) = number of times vi occurs in image Ii (keypoint histogram)

Score approach: want to determine P(Cj|Ii), where

(*)

5/30/2006 EE 148, Spring 2006 10

Multi-class Classifier –Naïve Bayes (2)

Goal: Find one specific class Cj so that

has maximum value

In order to avoid zero probability, use Laplace smoothing:

5/30/2006 EE 148, Spring 2006 11

Multi-class classifier –Support Vector Machine (SVM) Input: the keypoints histogram for each image

Multi-class one-against-all approach

Linear SVM gives better performances than quadratic or cubic SVM

Goal: find hyperplanes which separate multi-class data with maximun margin

5/30/2006 EE 148, Spring 2006 12

Multi-class classifier –SVM (2)

5/30/2006 EE 148, Spring 2006 13

Evaluation of Multi-class Classifiers

Three performance measures: The confusion matrix

Each column of the matrix represents the instances in a predicted class Each row represents the instances in an actual class

The overall error rate = Pr(output class = true class)

The mean ranks The mean position of the correct labels when labels output by the multi-

class classifier are sorted by the classifier score.

5/30/2006 EE 148, Spring 2006 14

n-Fold Cross Validation What is “fold”?

Randomly break the dataset into n partitions

Example: suppose n = 10 Training on 2, 3,…,10; testing on 1 = result 1 Training on 1, 3,…,10; testing on 2 = result 2 … Answer = Average of result 1, result 2, ….

5/30/2006 EE 148, Spring 2006 15

Experiment on Naïve Bayes –k’s effect

Present the overal error rate as a function of # of clusters k

Result

Error rate decreases as k increases

Selecting point: k = 1000

After passing the selecting point, the error rate decreases slowly

5/30/2006 EE 148, Spring 2006 16

Experiment on Naïve Bayes –Confusion Matrix

faces buildings trees cars phones bikes books

faces 76 4 2 3 4 4 13

buildings 2 44 5 0 5 1 3

trees 3 2 80 0 0 5 0

cars 4 1 0 75 3 1 4

phones 9 15 1 16 70 14 11

bikes 2 15 12 0 8 73 0

books 4 19 0 6 7 2 69

error rate

24 56 20 25 27 27 31

mean rank

1.49 1.88 1.33 1.33 1.63 1.57 1.57

5/30/2006 EE 148, Spring 2006 17

Experiment on SVM –Confusion Matrix

faces buildings trees cars phones bikes books

faces 98 14 10 10 34 0 13

buildings 1 63 3 0 3 1 6

trees 1 10 81 1 0 6 0

cars 0 1 1 85 5 0 5

phones 0 5 4 3 55 2 3

bikes 0 4 1 0 1 91 0

books 0 3 0 1 2 0 73

error rate 2 27 19 15 45 9 27

mean rank

1.04 1.77 1.28 1.30 1.83 1.09 1.39

5/30/2006 EE 148, Spring 2006 18

Interpretation of Results The confusion matrix

In general, SVM has more correct predictions than Naïve Bayes does

The overall error rate In general, Naïve Bayes > SVM

The Mean Rank In general, SVM < Naïve Bayes

5/30/2006 EE 148, Spring 2006 19

Why do we have errors? There are objects from more than 2 classes in one image The data set is not totally clean (noise) Each image is given only one training label

5/30/2006 EE 148, Spring 2006 20

Conclusion Bag-of-Keypoints is a new and efficient generic visual categorizer.

Evaluated on a seven-category database, this method is proved that it is robust to Choice of clusters, background clutter, multiple objects

Any Questions?

Thank you for listening to my presentation!! :)

Documents

5/30/2006EE 148, Spring 20061 Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray