Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick...

Prof. Didier Stricker

Kaiserlautern University

http://ags.cs.uni-kl.de/

DFKI – Deutsches Forschungszentrum für Künstliche Intelligenz

http://av.dfki.de

2D Image Processing

Introduction to object recognition & SVM

• Basic recognition tasks

• A statistical learning approach

• Traditional recognition pipeline• Bags of features

• Classifiers

• Support Vector Machine

Overview

Image classification

• outdoor/indoor

• city/forest/factory/etc.

Image tagging

• street

• people

• building

• mountain

• …

Object detection

• find pedestrians

• walking

• shopping

• rolling a cart

• sitting

• talking

• …

Activity recognition

mountain

building

marketpeople

street

building

Image parsing

This is a busy street in an Asian city.

Mountains and a large palace or

fortress loom in the background. In the

foreground, we see colorful souvenir

stalls and people walking around and

shopping. One person in the lower left

is pushing an empty cart, and a couple

of people in the middle are sitting,

possibly posing for a photograph.

Image description

Image classification

Apply a prediction function to a feature

representation of the image to get the

desired output:

f( ) = “apple”

f( ) = “tomato”

f( ) = “cow”

The statistical learning framework

y = f(x)

Training: given a training set of labeled examples

{(x1,y1), …, (xN,yN)}, estimate the prediction function f by

minimizing the prediction error on the training set

Testing: apply f to a never before seen test example x

and output the predicted value y = f(x)

The statistical learning framework

output prediction

function

feature

Prediction

Training Labels

Training

Images

Training

Image Features

Testing

Test Image

Learned model

Slide credit: D. Hoiem

Traditional recognition pipeline

Hand-designed

feature

extraction

Trainable

classifier

Pixels

• Features are not learned

• Trainable classifier is often generic (e.g. SVM)

Object

Bags of features

1. Extract local features

2. Learn “visual vocabulary”

3. Quantize local features using visual vocabulary

4. Represent images by frequencies of “visual words”

Traditional features: Bags-of-features

• Sample patches and extract descriptors

1. Local feature extraction

• Orderless document representation:

frequencies of words from a dictionary Salton & McGill (1983)

• Classification to determine document categories

Bags of features: Motivation

Traditional recognition pipeline

Hand-designed

feature

extraction

Trainable

classifier

PixelsObject

f(x) = label of the training example nearest to x

All we need is a distance function for our inputs

No training required!

Classifiers: Nearest neighbor

exampleTraining

examples

from class 1

Training

examples

from class 2

• For a new point, find the k closest points from training data

• Vote for class label with labels of the k points

K-nearest neighbor classifier

Which classifier is more robust to outliers?

Credit: Andrej Karpathy, http://cs231n.github.io/classification/ 21

Credit: Andrej Karpathy, http://cs231n.github.io/classification/ 22

Linear classifiers

Find a linear function to separate the classes:

f(x) = sgn(w x + b)

Visualizing linear classifiers

Source: Andrej Karpathy, http://cs231n.github.io/linear-classify/ 24

• NN pros: Simple to implement

Decision boundaries not necessarily linear

Works for any number of classes

Nonparametric method

• NN cons: Need good distance function

Slow at test time

• Linear pros: Low-dimensional parametric representation

Very fast at test time

• Linear cons: Works for two classes

How to train the linear function?

What if data is not linearly separable?

Nearest neighbor vs. linear classifiers

Linear Classifiers can lead to many equally

valid choices for the decision boundary

Maximum Margin

Are these really

“equally valid”?

How can we pick which is best?

Maximize the size of the margin.

Max Margin

Are these really

“equally valid”?

Large Margin

Small Margin

Support Vectors are those

input points (vectors) on

the decision boundary

1. They are vectors

2. They “support” the

decision hyperplane

Support Vectors

Define this as a decision

problem

The decision hyperplane:

No fancy math, just the

equation of a hyperplane.

Support Vectors

Any line in the plane is perpendicular to

the normal of the plane

Which can be written in the form of

Where the W is the normal, and b is the –ve distance

of the plane to the origin (up to the scale of w).

Recall: Equation of a plane

Define this as a decision problem

Decision Function:

Support Vectors

Define this as a decision problem

Margin hyperplanes:

Support Vectors

Scale invariance

Support Vectors

Scale invariance

Support Vectors

Scale invariance

This scaling does not change the

decision hyperplane, or the support

vector hyperplanes. But we will

eliminate a variable from the

optimization

We will represent the size of

the margin in terms of w.

This will allow us to

simultaneously Identify a decision boundary

Maximize the margin

What are we optimizing?

1. There must at least

one point that lies on

each support

hyperplanes

How do we represent the size of

the margin in terms of w?

Proof outline: If not, we

could define a larger

margin support

hyperplane that does

touch the nearest

point(s).

1. There must at least one point

that lies on each support

hyperplanes

2. Thus:

How do we represent the size of the

margin in terms of w?

3. And:

1. There must at least one

point that lies on each

support hyperplanes

2. Thus:

3. And:

The vector w is perpendicular to the decision hyperplane

If the dot product of two vectors equals zero, the two vectors are perpendicular.

The margin is the projection

of x1 – x2 onto w, the normal

of the hyperplane.

Recall: Vector Projection

The margin is the projection

of x1 – x2 onto w, the normal of

the hyperplane.

Size of the Margin:

Projection:

Goal: maximize the margin

Maximizing the margin

Linear separability of the data

by the decision boundary

If constraint optimization then Lagrange Multipliers

Optimize the “Primal”

Max Margin Loss Function

Lagrange Multiplier + Karush-Kuhn-Tucker (KKT) condition

Partial wrt b

Partial wrt w

Now have to find αi.

Substitute back to the Loss function

Construct the “dual”

Solve this quadratic program to identify the

lagrange multipliers and thus the weights

Dual formulation of the error

There exist (rather) fast approaches to quadratic

program solving in both C, C++, Python, Java and R

Quadratic Programming

•If Q is positive semi definite, then f(x) is convex.

•If f(x) is convex, then there is a single maximum.

In constraint optimization: At the optimal solution

Constraint * Lagrange Multiplier = 0

Karush-Kuhn-Tucker Conditions

Only points on the decision boundary contribute to the solution!

Support Vector Expansion

When αi is non-zero then xi is a support vector

When αi is zero xi is not a support vector52

New decision Function

• General idea: the original input space can always

be mapped to some higher-dimensional feature

space where the training set is separable

Nonlinear SVMs

Φ: x→ φ(x)

Image source 53

• Linearly separable dataset in 1D:

• Non-separable dataset in 1D:

• We can map the data to a higher-dimensional space:

Nonlinear SVMs

Slide credit: Andrew Moore54

• General idea: the original input space can always be

mapped to some higher-dimensional feature space

where the training set is separable.

• The kernel trick: instead of explicitly computing the

lifting transformation φ(x), define a kernel function K

such that

K(x,y) = φ(x) · φ(y)

(to be valid, the kernel function must satisfy

Mercer’s condition)

The kernel trick

• Linear SVM decision function:

The kernel trick

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

and Knowledge Discovery, 1998

bybi iii xxxw

Support

vector

learned

weight

bKybyi

iii ),()()( xxxx

The kernel trick

• Linear SVM decision function:

• Kernel SVM decision function:

• This gives a nonlinear decision boundary in the original feature space

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining

and Knowledge Discovery, 1998

bybi iii xxxw

Polynomial kernel: dcK )(),( yxyx

• Also known as the radial basis function

(RBF) kernel:

Gaussian kernel

||x – y||

K(x, y)

1exp),( yxyx

Gaussian kernel

SV’s

• Pros Kernel-based framework is very powerful, flexible

Training is convex optimization, globally optimal solution

can be found

Amenable to theoretical analysis

SVMs work very well in practice, even with very small

training sample sizes

• Cons No “direct” multi-class SVM, must combine two-class

SVMs (e.g., with one-vs-others)

Computation, memory (esp. for nonlinear SVMs)

SVMs: Pros and cons

• Generalization refers to the ability to correctly

classify never before seen examples

• Can be controlled by turning “knobs” that affect

the complexity of the model

Generalization

Training set (labels known) Test set (labels

unknown)62

• Training error: how does the model perform on the

data on which it was trained?

• Test error: how does it perform on never before seen

Diagnosing generalization ability

Source: D. Hoiem

Training error

Test error

Underfitting Overfitting

Model complexity HighLow

• Underfitting: training and test error are both high Model does an equally poor job on the training and the test set

Either the training procedure is ineffective or the model is too

“simple” to represent the data

• Overfitting: Training error is low but test error is high• Model fits irrelevant characteristics (noise) in the training data

Model is too complex or amount of training data is insufficient

Underfitting and overfitting

Underfitting

Overfitting

Good generalization

Figure source64

Effect of training set size

Many training examples

Few training examples

Source: D. Hoiem65

Effect of training set size

Many training examples

Few training examples

Source: D. Hoiem66

• Split the data into training, validation, and test subsets

• Use training set to optimize model parameters

• Use validation test to choose the best model

• Use test set only to evaluate performance

Validation

Model complexity

Training set loss

Test set loss

Validation

set loss

Stopping point

Histogram of Oriented Gradients (HoG)

[Dalal and Triggs, CVPR 2005]

10x10 cells

20x20 cells

HoGify

Histogram of Oriented Gradients

Many examples on Internet:

https://www.youtube.com/watch?v=b_DSTuxdGDw

Histogram of Oriented Gradients

[Dalal and Triggs, CVPR 2005] 70

Demo software:

http://cs.stanford.edu/people/karpathy/svmjs/demo

A useful website: www.kernel-machines.org

Software: LIBSVM: www.csie.ntu.edu.tw/~cjlin/libsvm/

SVMLight: svmlight.joachims.org/

Resources

Thank you!

Introduction to object recognition & SVM · •Linear SVM decision function: The kernel trick...

Documents

Facial Expression Recognition by SVM-based Two-stage ...b2.cvl.iis.u-tokyo.ac.jp/mva/proceedings/2007CD/papers/13-06.pdf · Facial Expression Recognition by SVM-based Two-stage Classiﬁer

A Biometric System Based on Neural Networks and SVM … · and SVM Using Morphological Feature Extraction from Hand ... as an input to some recognition systems using neural networks

ISI Effects in a Hybrid ICA-SVM Modulation Recognition ...ece-research.unm.edu/bsanthan/PUB/BoSa08.pdfFig. 1. Block diagram for the feature weighted hybrid ICA-SVM modulation recognition

__ - A SURVEY ON PATTERN RECOGNITION APPLICATIONS OF SVM

Lior Wolf and Noga Levy The SVM-minus Similarity Score for Video Face Recognition

Comparative Analysis using Gabor Wavelets, SVM and PCA Methods for Face Recognition

Encountering Burges

Ancient Tamil Vattezhutthu Alphabets Recognition in Stone Inscription Using Wavelet Transform and SVM Classifier

Study of Supply Vector Machine (SVM) for Emotion Recognition

Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set Recognition

Combination of minimum enclosing balls classifier with SVM in … · 2019-01-30 · RESEARCH ARTICLE Combination of minimum enclosing balls classifier with SVM in coal-rock recognition

Automatic Speaker Recognition Using SVM · Automatic Speaker Recognition Using SVM Umer Malik1, P.K. Mishra2 1 M.Tech Student, ... Speaker recognition is a well known example of it

License Plate Detection and Recognition Algorithm for ...maselab318.nfu.edu.tw/tsong/CACS/CACS/1119.pdf · operator and SVM. The SVM performs a supervised learning about 1000 times

2.3.3h-Kampf y Burges (2007)

Offline Marathi Handwritten Character Recognition … Marathi Handwritten Character Recognition Using SVM Classifier ... secondly on Bangla script. ... Lanwen Jin, Cong He, Guibin

Quaystone - Burges Salmon...marcus.harling@burges-salmon.com, william.gard@burges-salmon.com, steven.james@burges-salmon.com or lloyd.james@burges-salmon.com or your usual Construction

SVM-HMM LANDMARK BASED SPEECH RECOGNITION€¦ · SVM-HMM LANDMARK BASED SPEECH RECOGNITION Sarah Borys and Mark Hasegawa-Johnson 405 N. Mathews Urbana, IL 61801 Abstract Supportvector

Unconstrained Face Recognition Using SVM Across · PDF fileUnconstrained Face Recognition Using SVM ... In this project the main focus is on jointly handling both the problems of image

Authorised Fund Horizons - Burges Salmon

LandAid Summer Run 2015 official team results...Burges Salmon 00:38:35 1 Burges Salmon and Friends 1 194 Jeremy Benson Burges Salmon 00:42:27 1 Burges Salmon and Friends 1 724 Matt