Tamara Berg Object Recognition – BoF models

Tamara BergObject Recognition – BoF models

790-133Recognizing People, Objects, & Actions

Topic Presentations• Hopefully you have met your topic presentations group

members?

• Group 1 – see me to run through slides this week or Monday at the latest (I’m traveling Thurs/Friday). Send me links to 2-3 papers for the class to read.

• Sign up for class google group (790-133). To find the group go to groups.google.com and search for 790-133 (sorted by date). Use this to post/answer questions related to the class.

ObjectBag of

‘features’

Bag-of-features models

source: Svetlana Lazebnik

Exchangeability

• De Finetti Theorem of exchangeability (bag of words theorem): the joint probability distribution underlying the data is invariant to permutation.

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

Bag of words for text

· Represent documents as a “bags of words”

Example

• Doc1 = “the quick brown fox jumped”• Doc2 = “brown quick jumped fox the”

Would a bag of words model represent these two documents differently?

Bag of words for images

· Represent images as a “bag of features”

Bag of features: outline1. Extract features

Bag of features: outline1. Extract features2. Learn “visual vocabulary”

Bag of features: outline1. Extract features2. Learn “visual vocabulary”3. Represent images by frequencies of

“visual words”

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Visual vocabulary

K-means clustering (reminder)• Want to minimize sum of squared Euclidean

distances between points xi and their nearest cluster centers mk

Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:

• Assign each data point to the nearest center• Recompute each cluster center as the mean of all points assigned

ki mxMXDcluster

clusterinpoint

2)(),(

Example visual vocabulary

Fei-Fei et al. 2005

Image Representation• For a query image

Extract features

Associate each feature with the nearest cluster center (visual word)

Accumulate visual word frequencies over the image

Visual vocabulary

3. Image representation

codewords

4. Image classification

codewords

Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Image Categorization

Choose from many categoriesWhat is this? helicopter

Image Categorization

Choose from many categoriesWhat is this?

SVM/NBCsurka et al (Caltech 4/7)Nearest NeighborBerg et al (Caltech 101)Kernel + SVMGrauman et al (Caltech 101)Multiple Kernel Learning + SVMsVarma et al (Caltech 101)…

Visual Categorization with Bags of KeypointsGabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray

• Images in 7 classes: faces, buildings, trees, cars, phones, bikes, books

• Caltech 4 dataset: faces, airplanes, cars (rear and side), motorbikes, background

Method

Steps:– Detect and describe image patches.– Assign patch descriptors to a set of predetermined

clusters (a visual vocabulary).– Construct a bag of keypoints, which counts the

number of patches assigned to each cluster.– Apply a classifier (SVM or Naïve Bayes), treating the

bag of keypoints as the feature vector– Determine which category or categories to assign to

the image.

Bag-of-Keypoints Approach

Interesting Point Detection

Key PatchExtraction

FeatureDescriptors Bag of Keypoints Multi-class

Classifier

5.1...

5.01.0

Slide credit: Yun-hsueh Liu

SIFT Descriptors

Key PatchExtraction

Classifier

5.1...

5.01.0

Bag of Keypoints (1)

• Construction of a vocabulary– Kmeans clustering find “centroids” (on all the descriptors we find from all the training images) – Define a “vocabulary” as a set of “centroids”, where every centroid represents

a “word”.

Key PatchExtraction

Classifier

Bag of Keypoints (2)

• Histogram– Counts the number of occurrences of different visual words in each image

Key PatchExtraction

Classifier

Multi-class Classifier

• In this paper, classification is based on conventional machine learning approaches– Support Vector Machine (SVM)– Naïve Bayes

Key PatchExtraction

Classifier

Reminder: Linear SVM

x2 Margin

wT x + b = 0

wT x + b = -1w

T x + b = 1

Support Vectors

Slide credit: Jinwei GuSlide 30 of 113

( ) Tg b x w x

( ) 1Ti iy b w x

21minimize 2w

Nonlinear SVMs: The Kernel Trick With this mapping, our discriminant function becomes:

( ) ( ) ( ) ( )T Ti i

x w x x x

No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test.

A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space:

( , ) ( ) ( )Ti j i jK x x x x

Slide credit: Jinwei Gu

Nonlinear SVMs: The Kernel Trick

Linear kernel:

2( , ) exp( )2i j

x xx x

( , ) Ti j i jK x x x x

( , ) (1 )T pi j i jK x x x x

0 1( , ) tanh( )Ti j i jK x x x x

Examples of commonly-used kernel functions:

Polynomial kernel:

Gaussian (Radial-Basis Function (RBF) ) kernel:

Sigmoid:

Slide credit: Jinwei Gu

SVM for image classification

• Train k binary 1-vs-all SVMs (one per class)• For a test instance, evaluate with each

classifier• Assign the instance to the class with the

largest SVM output

Naïve Bayes

Naïve Bayes Model

C – Class F - Features

We only specify (parameters): prior over class labels

how each feature depends on the class

Slide from Dan Klein

Example:

Percentage of documents in training set labeled as spam/ham

In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words).

In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words).

Classification

The class that maximizes:

Classification

• In practice

Classification

• In practice– Multiplying lots of small probabilities can result in

floating point underflow

Classification

floating point underflow– Since log(xy) = log(x) + log(y), we can sum log

probabilities instead of multiplying probabilities.

Classification

probabilities instead of multiplying probabilities.– Since log is a monotonic function, the class with

the highest score does not change.

Classification

probabilities instead of multiplying probabilities.– Since log is a monotonic function, the class with

the highest score does not change.– So, what we usually compute in practice is:

Naïve Bayes on images

Naïve Bayes

C – Class F - Features

We only specify (parameters): prior over class labels

how each feature depends on the class

Naive Bayes Parameters

Problem: Categorize images as one of k object classes using Naïve Bayes classifier:– Classes: object categories (face, car, bicycle, etc)– Features – Images represented as a histogram of

visual words. are visual words.

treated as uniform. learned from training data – images labeled

with category. Probability of a visual word given an image category.

Multi-class classifier –Naïve Bayes (1)

• Let V = {vi}, i = 1,…,N, be a visual vocabulary, in which each vi represents a visual word (cluster centers) from the feature space.

• A set of labeled images I = {Ii } .

• Denote Cj to represent our Classes, where j = 1,..,M

• N(t,i) = number of times vi occurs in image Ii

• Compute P(Cj|Ii):

Multi-class Classifier –Naïve Bayes (2)

• Goal - Find maximum probability class Cj:

• In order to avoid zero probability, use Laplace smoothing:

Results

Results on Dataset 2

Results

Thoughts?

• Pros?

• Cons?

Related BoF modelspLSA, LDA, …

wordtopicdocument

pLSA on images

Discovering objects and their location in imagesJosef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman

Documents – ImagesWords – visual words (vector quantized SIFT descriptors)Topics – object categories

Images are modeled as a mixture of topics (objects).

They investigate three areas: – (i) topic discovery, where categories are

discovered by pLSA clustering on all available images.

– (ii) classification of unseen images, where topics corresponding to object categories are learnt on one set of images, and then used to determine the object categories present in another set.

– (iii) object detection, where you want to determine the location and approximate segmentation of object(s) in each image.

(i) Topic Discovery

Most likely words for 4 learnt topics (face, motorbike, airplane, car)

(ii) Image Classification

Confusion table for unseen test images against pLSA trained on images containing four object categories, but no background images.

(ii) Image Classification

Confusion table for unseen test images against pLSA trained on images containing four object categories, and background images. Performance is not quite as good.

(iii) Topic Segmentation

Tamara Berg Object Recognition – BoF models

Documents

, Xin Lu , , Tamara L. Berg Adobe Research … › pdf › 1801.08186.pdfMohit Bansal 1, Tamara L. Berg 1University of North Carolina at Chapel Hill 2Adobe Research Abstract In this

Probability Tamara Berg CS 590-133 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Die Idee für „das Andere“ entstand als Tamara Berg-€¦ · tamara@bergmann-moebel.at telefon: 04227/37 68 0664/54 26 099. Die Idee für „das Andere“ entstand als Tamara

Data-driven Generation of Image Descriptions Vicente Ordonez-Roman The State University of New York Previously: Advisor: Tamara Berg

Sound Applications Advanced Multimedia Tamara Berg

Reinforcement Learning Tamara Berg CS 590-133 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Tamara Berg Retrieval 790-133 Language and Vision

IRISH SETTER CLUB OF AMERICA MEDALLIONS …...IRISH SETTER CLUB OF AMERICA MEDALLIONS 2015 CHAMPIONSHIP TITLES CHAMPION GCH CH Shamrock Ima Class Act Too ..... Tamara Berg & Gene Berg

Names and Faces in the News - University of Chicagottic.uchicago.edu/~mmaire/papers/pdf/names_faces_cvpr...Names and Faces in the News Tamara L. Berg, Alexander C. Berg, Jaety Edwards,

Paper Doll Parsing: Retrieving Similar Styles to - Tamara L Berg

Classification II Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

In this issue Dr. Tamara Berg 2018 AAWD President · Dr. Tamara Berg 2018 AAWD President Page 18: ... At our meetings, we can discuss family life, ... I hope to meet many of you on

Names and Faces - People @ EECS at UC Berkeleyefros/courses/AP06/...Names and Faces Tamara L. Berg, Alexander C. Berg, Jaety Edwards, Michael Maire, Ryan White, Yee-Whye Teh, Erik

Fundamentals of Multimedia, Chapter 6 Sound Intro Tamara Berg Advanced Multimedia 1

When Was That Made? - arXiv · 2016. 8. 16. · When Was That Made? Sirion Vittayakorn Alexander C. Berg Tamara L. Berg University of North Carolina at Chapel Hill sirionv, aberg,

Morphing CSE 590 Computational Photography Tamara Berg

MDPs (cont) & Reinforcement Learning Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein,

Fundamentals of Multimedia, Chapter 6 Sound Analysis Tamara Berg Advanced Multimedia 1

Advanced Multimedia Music Information Retrieval Tamara Berg

Names and Faces in the News - Computational Vision: [Home]people.vision.caltech.edu/~mmaire/papers/pdf/names_faces...Names and Faces in the News Tamara L. Berg, Alexander C. Berg,