slides: Machine Learning for Computer Vision

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 1/135

Machine Learning in

Computer Vision

Fei-Fei Li

W h at is ( com p u t er ) v i sion ?

• When we “see” something, what does itinvolve?

• Take a picture with a camera, it is just abunch of colored dots (pixels)

• Want to make computers understandimages

• Looks easy, but not really…

Image (or video) Sensing device Interpreting device Interpretations

Corn/mature corn jn a cornfield/ plant/blue sky inthe background

What is it related to?

Computer Vision

Neuroscience

Machine learning

Speech Information retrieval

ComputerScience

InformationEngineering

Physics

Biology

Robotics

Cognitivesciences

Psychology

What about this?

A picture is worth a thousand words.--- Confucius

or Printers’ Ink Ad (1921)

horizontal lines

vertical

blue on the top

porous

oblique

shadow to the left

textured

large green patches

building court yard

clear sky

campus

autumn leaves

people

day time

bicycles

talking outdoor

Today: machine learning methods

for object recognition

outline

• Intro to object categorization

• Brief overview – Generative

– Discriminative• Generative models

• Discriminative models

How many object categories are there?

Biederman 1987

Challenges 1: view point variation

Michelangelo 1475-1564

Challenges 2: illumination

slide credit: S. Ullman

Challenges 3: occlusion

Magritte, 1957

Challenges 4: scale

Ch ll d f i

Challenges 5: deformation

Xu, Beihong 1943

Ch ll 6 b k d l tt

Challenges 6: background clutter

Klimt, 1913

History: single object recognition

• Lowe, et al. 1999, 2003

• Mahamud and Herbert, 2000

• Ferrari, Tuytelaars, and Van Gool, 2004

• Rothganger, Lazebnik, and Ponce, 2004

• Moreels and Perona, 2005• …

Ch ll 7 i t l i ti

Challenges 7: intra-class variation

Object categorization:Object categorization:

the statistical viewpointthe statistical viewpoint

)|( image zebra p

)( e zebra|imagno pvs.

• Bayes rule:

zebrano p

zebra p

zebranoimage p

zebraimage p

image zebrano p

image zebra p

posterior ratio likelihood ratio prior ratio

the statistical viewpointthe statistical viewpoint

zebrano p

zebra p

zebranoimage p

zebraimage p

image zebrano p

image zebra p⋅=

posterior ratio likelihood ratio prior ratio

• Discriminative methods model posterior

• Generative methods model likelihood and

Di i i ti

Discriminative

• Direct modeling of

Non-zebra

Decisionboundary

image zebrano p

image zebra p

Generative

• Model and

Generative

)|( zebraimage p )|( zebranoimage p

Low Middle

High MiddleLow

)|( zebranoimage p)|( zebraimage p

Th i i

Three main issuesThree main issues

• Representation

– How to represent an object category

• Learning

– How to form the classifier, given training data

• Recognition – How the classifier is to be used on novel data

Representation

– Generative / discriminative / hybrid

Representation

– Appearance only orlocation andappearance

Representation

– Invariances• View point

• Illumination

• Occlusion• Scale

• Deformation

• Clutter• etc.

Representation

– invariances

– Part-based or global

w/sub-window

R t ti

Representation

– invariances

– Parts or global w/sub-

window – Use set of features or

each pixel in image

– Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning

Learning

– Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)

– Methods of training: generative vs.discriminative

Learning

0 0.2 0.4 0.6 0.8 10

c l a s s

s i t i e s

p( x|C 1)

p( x|C 2)

x0 0.2 0.4 0.6 0.8 1

p o s t e r i o r p r o b a b i l i t i e s

p(C 1| x) p(C

– What are you maximizing? Likelihood(Gen.) or performances on train/validationset (Disc.)

– Level of supervision• Manual segmentation; bounding box; image

labels; noisy labels

Learning

Contains a motorbike

– Batch/incremental (on category and imagelevel; user-feedback )

Learning

– Training images:• Issue of overfitting

• Negative images for discriminative methods

Learning

– Training images:• Issue of overfitting

• Negative images for discriminative methods

– Priors

Learning

Recognition

– Scale / orientation range to search over

– Speed

Recognition

Bag-of-words models

ObjectObject Bag of Bag of ‘‘wordswords’’

Analogy to documentsAnalogy to documents

Of all the sensory impressions proceeding tothe brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on themessages that reach the brain from our eyes.For a long time it was thought that the retinal

image was transmitted point by point to visualcenters in the brain; the cerebral cortex was amovie screen, so to speak, upon which theimage in the eye was projected. Through thediscoveries of Hubel and Wiesel we now

know that behind the origin of the visualperception in the brain there is a considerablymore complicated course of events. Byfollowing the visual impulses along their pathto the various cell layers of the optical cortex,Hubel and Wiesel have been able to

demonstrate that the message about the image falling on the retina undergoes a step-

wise analysis in a system of nerve cells

stored in columns. In this system each cell

has its specific function and is responsible for

a specific detail in the pattern of the retinal

image.

sensory, brain,visual, perception,

retinal, cerebral cortex,

eye, cell, optical

nerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefoldincrease on 2004's $32bn. The CommerceMinistry said the surplus would be created bya predicted 30% jump in exports to $750bn,compared with a 18% rise in imports to

$660bn. The figures are likely to furtherannoy the US, which has long argued thatChina's exports are unfairly helped by adeliberately undervalued yuan. Beijingagrees the surplus is too high, but says the

yuan is only one factor. Bank of Chinagovernor Zhou Xiaochuan said the countryalso needed to do more to boost domesticdemand so more goods stayed within thecountry. China increased the value of theyuan against the dollar by 2.1% in July and

permitted it to trade within a narrow band, butthe US wants the yuan to be allowed to tradefreely. However, Beijing has made it clear thatit will take its time and tread carefully beforeallowing the yuan to rise further in value.

China, trade,surplus, commerce,

exports, imports, US,

yuan, bank, domestic,

foreign, increase,trade, value

learninglearning recognitionrecognition

categorycategory

decisiondecision

feature detection

& representation

codewords dictionarycodewords dictionary

image representation

category modelscategory models

(and/or) classifiers(and/or) classifiers

RepresentationRepresentation

feature detection

& representation

1.Feature detection and representation1.Feature detection and representation

eatu e detect o a d ep ese tat op

1.Feature1.Feature detectiondetection andand representationrepresentation

Normalize

Detect patches

[Mikojaczyk and Schmid ’02]

[Matas et al. ’02]

[Sivic et al. ’03]

Compute

descriptor

[Lowe’99]

Slide credit: Josef Sivic

1.Feature1.Feature detectiondetection andand representationrepresentation

2. Codewords dictionary formation2. Codewords dictionary formation

Vector quantization

Slide credit: Josef Sivic

Fei-Fei et al. 2005

3. Image representation3. Image representation

f r e q u e

codewords

RepresentationRepresentation

feature detection

& representation

Learning and RecognitionLearning and Recognition

categorycategory

decisiondecision

category modelscategory models

(and/or) classifiers(and/or) classifiers

2 case studies2 case studies

1. Naïve Bayes classifier

– Csurka et al. 2004

2. Hierarchical Bayesian text models(pLSA and LDA)

– Background: Hoffman 2001, Blei et al. 2004 – Object categorization: Sivic et al. 2005, Sudderth et

al. 2005

– Natural scene categorization: Fei-Fei et al. 2005

First, some notationsFirst, some notations

• wn: each patch in an image

– wn = [0,0,…1,…,0,0]T

• w: a collection of all N patches in an image

– w = [w1,w2,…,wN]

• d j: the jth image in an image collection

• c: category of the image

• z: theme or topic of the patch

Case #1: the NaCase #1: the Na ï ï veve BayesBayes modelmodel

)|()( cw pc p

Prior prob. ofthe object classes

Image likelihoodgiven the class

Csurka et al. 2004

n cw pc p1

Object classdecision

∝)|( wc pc

c maxarg=∗

Case #2: Hierarchical BayesianCase #2: Hierarchical Bayesian

text modelstext models

Hoffman, 2001

Blei et al., 2001

Probabilistic Latent Semantic Analysis (pLSA)

Latent Dirichlet Allocation (LDA)

Case #2: Hierarchical BayesianCase #2: Hierarchical Bayesiantext modelstext models

Probabilistic Latent Semantic Analysis (pLSA)

“face”

Sivic et al. ICCV 2005

Case #2: Hierarchical BayesianCase #2: Hierarchical Bayesiantext modelstext models

Latent Dirichlet Allocation (LDA)

Fei-Fei et al. ICCV 2005

“beach”

Another application

• Human action classification

Invariance issuesInvariance issues

• Scale and rotation – Implicit

– Detectors and descriptors

Kadir and Brady. 2003

• Scale and rotation• Occlusion

– Implicit in the models

– Codeword distribution: small variations

– (In theory) Theme (z) distribution: different

occlusion patterns

• Translation – Encode (relative) location information

Sudderth et al. 2005

• Translation• View point (in theory)

– Codewords: detectorand descriptor

– Theme distributions:

different view points

Fergus et al. 2005

Model propertiesModel properties

Of all the sensory impressions proceeding tothe brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on themessages that reach the brain from our eyes.

For a long time it was thought that the retinalimage was transmitted point by point to visualcenters in the brain; the cerebral cortex was amovie screen, so to speak, upon which theimage in the eye was projected. Through thediscoveries of Hubel and Wiesel we now

know that behind the origin of the visualperception in the brain there is a considerablymore complicated course of events. Byfollowing the visual impulses along their pathto the various cell layers of the optical cortex,Hubel and Wiesel have been able to

demonstrate that the message about the image falling on the retina undergoes a step-

wise analysis in a system of nerve cells

stored in columns. In this system each cell

has its specific function and is responsible for

a specific detail in the pattern of the retinal image.

sensory, brain,

visual, perception,

retinal, cerebral cortex,

eye, cell, optical

nerve, image

Hubel, Wiesel

• Intuitive

– Analogy to documents

• Intuitive

• (Could use)generative models

– Convenient for weakly-

or un-supervisedtraining

– Prior information

– Hierarchical BayesianframeworkSivic et al., 2005, Sudderth et al., 2005

• Intuitive

• (Could use)generative models

• Learning and

recognition relativelyfast

– Compare to other

methods

Weakness of the modelWeakness of the model

• No rigorous geometric informationof the object components

• It’s intuitive to most of us thatobjects are made of parts – no

such information• Not extensively tested yet for – View point invariance

– Scale invariance

• Segmentation and localization

unclear

part-based models

Slides courtesy to Rob Fergus for “part-based models”

One-shot learning

of object categoriesFei-Fei et al. ‘03, ‘04, ‘06

One-shot learning

of object categories

P. Brueg e l , 15 62

Fei-Fei et al. ‘03, ‘04, ‘06

model representation

One-shot learning

of object categoriesFei-Fei et al. ‘03, ‘04, ‘06

X (location)

(x,y) coords. of region center

A (appearance)

Projection ontoPCA basis

normalize

1 1 x 1 1

p a t c h

X (location)The Generative Model

(x,y) coords. of region center

A (appearance)

Projection ontoPCA basis

normalize

1 1 x 1 1

p a t c h

ΓXμX

ΓAμA

The Generative Model

ΓXμX

ΓAμA

observed variables

hidden variable

parameters

ΓXμX

ΓAμA

where θ = {µX, ΓX, µA, ΓA}

ML/MAP

Weber et al. ’98 ’00, Fergus et al. ’03

where θ = {µX, ΓX, µA, ΓA}

ML/MAPΓXμX

ΓAμA

shape model

appearance model

ΓXμX ΓAμA

ML/MAP

ΓXμX ΓAμA

Bayesian

Parameters to estimate: {mX, βX, aX, BX, mA, βA, aA, BA}i.e. parameters of Normal-Wishart distribution

Fei-Fei et al. ‘03, ‘04, ‘06

ΓXμX ΓAμA

parameters

priors

Fei-Fei et al. ‘03, ‘04, ‘06

ΓXμX ΓAμA

Prior distribution

Fei-Fei et al. ‘03, ‘04, ‘06

priors

1. human vision

2. modelrepresentation

3. learning& inferences

4. evaluation

& dataset& application

One-shot learning

of object categories

learning & inferences

One-shot learning

of object categoriesFei-Fei et al. 2003, 2004, 2006

No labeling No segmentation No alignment

learning & inferences

One-shot learning

of object categoriesFei-Fei et al. 2003, 2004, 2006

Bayesian

RandominitializationVariational EMVariational EM

E-Step

new estimate

of p(θ|train)

M-Step

prior knowledge of p(θ)Attias, Jordan, Hinton etc.

evaluation & dataset

One-shot learning

of object categoriesFei-Fei et al. 2004, 2006a, 2006b

evaluation & dataset -- Caltech 101 Dataset

One-shot learning

evaluation & dataset -- Caltech 101 Dataset

One-shot learning

Part 3: discriminative methods

Discriminative methodsObject detection and recognition is formulated as a classification problem.

Bag of image patches

Decision

boundary

… and a decision is taken at each window about if it contains a target object or not.

Computer screen

Background

In some feature space

Where are the screens?

The image is partitioned into a set of overlapping windows

Discriminative vs. generativeGenerati e model

(The lousy painter)

0 10 20 30 40 50 60 70

x = data

• Generative model

0 10 20 30 40 50 60 700

x = data

• Discriminative model

0 10 20 30 40 50 60 70 80

x = data

• Classification function

(The artist )

Discriminative methods

Nearest neighbor Neural networks

examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005…

Neural networks

LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…

Conditional Random FieldsSupport Vector Machines and Kernels

McCallum, Freitag, Pereira 2000Kumar, Hebert 2003

Guyon, Vapnik

Heisele, Serre, Poggio, 2001

F l ti bi l ifi ti

Formulation

• Formulation: binary classification

x1 x2 x3 xN

… xN+1 xN+2 xN+M

-1 -1 ? ? ?

Training data: each image patch is labeledas containing the object or background

Test data

Features x =

Labels y =

Where belongs to some family of functions

• Classification function

• Minimize misclassification error(Not that simple: we need some guarantees that there will be generalization)

Overview of section

• Object detection with classifiers

• Boosting

– Gentle boosting

– Weak detectors – Object model

– Object detection

• Multiclass object detection

A i l l ith f l i b t l ifi

Why boosting?

• A simple algorithm for learning robust classifiers – Freund & Shapire, 1995 – Friedman, Hastie, Tibshhirani, 1998

• Provides efficient algorithm for sparse visualfeature selection – Tieu & Viola, 2000 – Viola & Jones, 2003

• Easy to implement, not requires externaloptimization tools.

D fi l ifi i dditi d l

Boosting

• Defines a classifier using an additive model:

Strongclassifier

Weak classifier

WeightFeaturesvector

D fi l ifi i dditi d l

Boosting

• Defines a classifier using an additive model:

• We need to define a family of weak classifiers

Strongclassifier

Weak classifier

WeightFeaturesvector

from a family of weak classifiers

Boosting

• It is a sequential procedure:

Each data point has

a class label:

and a weight:

+1 ( )

-1 ( )yt =

• It is a sequential procedure:

Toy exampleWeak learners from the family of lines

Weak learners from the family of lines

h => p(error) = 0.5 it is at chance

Each data point has

a class label:

and a weight:

+1 ( )

-1 ( )yt =

Toy example

This one seems to be the best

Each data point has

a class label:

and a weight:

+1 ( )

-1 ( )yt =

This is a ‘weak classifier ’: It performs slightly better than chance.

Toy example

We set a new problem for which the previous weak classifier performs at chance again

Each data point has

a class label:

wt wt exp{-yt Ht}

We update the weights:

+1 ( )

-1 ( )yt =

Toy example

Each data point has

a class label:

wt wt exp{-yt Ht}

+1 ( )

-1 ( )yt =

Toy example

Each data point has

a class label:

wt wt exp{-yt Ht}

+1 ( )

-1 ( )yt =

Toy example

Each data point has

a class label:

wt wt exp{-yt Ht}

+1 ( )

-1 ( )yt =

Toy example

The strong (non- linear) classifier is built as the combination ofall the weak (linear) classifiers.

From images to features:Weak detectors

W ill d fi f il f i l

We will now define a family of visualfeatures that can be used as weak

classifiers (“weak detectors”)

Takes image as input and the output is binary response.The output is a weak detector.

Weak detectors

Textures of textures

Textures of texturesTieu and Viola, CVPR 2000

Every combination of three filtersgenerates a different feature

This gives thousands of features. Boosting selects a sparse subset, so computationson test time are very efficient. Boosting also avoids overfitting to some extend.

Weak detectors

Haar filters and integral image

Haar filters and integral imageViola and Jones, ICCV 2001

The average intensity in theblock is computed with foursums independently of theblock size.

Weak detectors

Other weak detectors:

• Carmichael, Hebert 2004

• Yuille, Snow, Nitzbert, 1998• Amit, Geman 1998

• Papageorgiou, Poggio, 2000

• Heisele, Serre, Poggio, 2001

• Agarwal, Awan, Roth, 2004

• Schneiderman, Kanade 2004• …

Weak detectors

Part based: similar to part-based generative

p gmodels. We create weak detectors by

using parts and voting for the object centerlocation

Car model Screen model

These features are used for the detector on the course web site.

Weak detectors

First we collect a set of part templates from a set of trainingobjects

objects.

Vidal-Naquet, Ullman (2003)

Weak detectors

We now define a family of “weak detectors” as:

Better than chance

Weak detectors

We can do a better job using filtered images

Still a weak detectorbut better than before

* *= ==

TrainingFirst we evaluate all the N features on all the training images.

Then, we sample the feature outputs on the object center and at randomlocations in the background:

Representation and object model

Selected features for the screen detector

4 101 2 3

Lousy painter

Representation and object modelSelected features for the car detector

1 2 3 4 10 100

… …

Overview of section

• Object detection with classifiers

Object detection with classifiers

• Boosting – Gentle boosting

– Weak detectors – Object model

– Object detection

• Multiclass object detection

Example: screen detectionFeatureoutput

Thresholdedoutput

Weak ‘detector’

Produces many false alarms.

Thresholdedoutput

Strong classifierat iteration 1

Thresholdedoutput

Strongclassifier

Second weak ‘detector’

Produces a different set offalse alarms.

Thresholdedoutput

Strongclassifier

Thresholdedoutput

Strongclassifier

Thresholdedoutput

Strongclassifier

Addingfeatures

classification

applications

Document Analysis

Digit recognition, AT&T labshttp://www.research.att.com/~yann /

Medical Imaging

Robotics

Toys and robots

Finger prints

Surveillance

Security

Searching the web

slides: Machine Learning for Computer Vision

Documents

Machine Vision - Dahuasecurity.com

machine vision inspection

NEW VISION NEW WORLDS HIKVISION MACHINE VISION PRODUCT CATALOG · en.hikrobotics.com NEW VISION NEW WORLDS HIKVISION MACHINE VISION PRODUCT CATALOG

Machine Vision Sample

Vision slides

Machine Slides

Robotics & Machine Vision

NEW VISION NEW WORLDS HIKVISION MACHINE VISION …

Machine Vision Poultry

Nasim Sajadi. Outline 2 Machine Vision Machine Vision Library Methodology Taxonomy EvaluationRecommendation

Machine Vision Lighting

Machine Vision 1392.10

Object Recognition using Vision and Machine Learningaass.oru.se/~tdt/ml/extra-slides/ML_obj_rec.pdf · Object Recognition using Vision and Machine Learning ... recognition such as

Machine Vision

MACHINE VISION - Couriertronics

Machine Vision VIS-SWIR cameras - Allied Vision

Workshop Based On Machine Vision/Image Processing Visionn.pdf · Workshop Based On Machine Vision/Image Processing ... machine vision technology for manufacturing applications. Using

Applied Machine Vision - Georgia Institute of Technologykmlee.gatech.edu/me6406/Applied Machine Vision Lecture...Applied Machine Vision ME Machine Vision Class Doug Britton – GTRI

150819 Intro to Machine Vision Slides

Machine vision systems - Home | George Mason …zduric/cs482/Slides/MachineVision.pdf · Machine vision systems - 3 Zoran Duric cs499 Digital image acquisition Camera, such as a scanner,