slides: Machine Learning for Computer Vision

Preview:

Citation preview

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 1/135

Machine Learning in

Computer Vision

Fei-Fei Li

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 2/135

W h at is ( com p u t er ) v i sion ?

• When we “see” something, what does itinvolve?

• Take a picture with a camera, it is just abunch of colored dots (pixels)

• Want to make computers understandimages

• Looks easy, but not really…

Image (or video) Sensing device Interpreting device Interpretations

Corn/mature corn jn a cornfield/ plant/blue sky inthe background

Etc.

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 3/135

What is it related to?

Computer Vision

Neuroscience

Machine learning

Speech Information retrieval

Maths

ComputerScience

InformationEngineering

Physics

Biology

Robotics

Cognitivesciences

Psychology

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 4/135

Quiz?

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 5/135

What about this?

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 6/135

A picture is worth a thousand words.--- Confucius

or Printers’ Ink  Ad (1921)

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 7/135

horizontal lines

vertical

blue on the top

porous

oblique

white

shadow to the left

textured

large green patches

A picture is worth a thousand words.--- Confucius

or Printers’ Ink  Ad (1921)

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 8/135

A picture is worth a thousand words.--- Confucius

or Printers’ Ink  Ad (1921)

building court yard

clear sky

campus

trees

autumn leaves

people

day time

bicycles

talking outdoor

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 9/135

Today: machine learning methods

for object recognition

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 10/135

outline

• Intro to object categorization

• Brief overview – Generative

 – Discriminative• Generative models

• Discriminative models

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 11/135

How many object categories are there?

Biederman 1987

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 12/135

Challenges 1: view point variation

Michelangelo 1475-1564

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 13/135

Challenges 2: illumination

slide credit: S. Ullman

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 14/135

Challenges 3: occlusion

Magritte, 1957

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 15/135

Challenges 4: scale

Ch ll d f i

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 16/135

Challenges 5: deformation

Xu, Beihong 1943

Ch ll 6 b k d l tt

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 17/135

Challenges 6: background clutter

Klimt, 1913

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 18/135

History: single object recognition

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 19/135

History: single object recognition

• Lowe, et al. 1999, 2003

• Mahamud and Herbert, 2000

• Ferrari, Tuytelaars, and Van Gool, 2004

• Rothganger, Lazebnik, and Ponce, 2004

• Moreels and Perona, 2005• …

Ch ll 7 i t l i ti

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 20/135

Challenges 7: intra-class variation

Object categorization:Object categorization:

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 21/135

Object categorization:Object categorization:

the statistical viewpointthe statistical viewpoint

)|( image zebra p

)( e zebra|imagno pvs.

• Bayes rule:

)(

)(

)|(

)|(

)|(

)|(

 zebrano p

 zebra p

 zebranoimage p

 zebraimage p

image zebrano p

image zebra p

⋅=

posterior ratio likelihood ratio prior ratio

Object categorization:Object categorization:

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 22/135

Object categorization:Object categorization:

the statistical viewpointthe statistical viewpoint

)(

)(

)|(

)|(

)|(

)|(

 zebrano p

 zebra p

 zebranoimage p

 zebraimage p

image zebrano p

image zebra p⋅=

posterior ratio likelihood ratio prior ratio

• Discriminative methods model posterior 

• Generative methods model likelihood and

prior 

Di i i ti

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 23/135

Discriminative

• Direct modeling of

Zebra

Non-zebra

Decisionboundary

)|(

)|(

image zebrano p

image zebra p

Generative

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 24/135

• Model and

Generative

)|( zebraimage p )|( zebranoimage p

Low Middle

High MiddleLow

)|( zebranoimage p)|( zebraimage p

Th i i

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 25/135

Three main issuesThree main issues

• Representation

 – How to represent an object category

• Learning

 – How to form the classifier, given training data

• Recognition – How the classifier is to be used on novel data

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 26/135

Representation

 – Generative / discriminative / hybrid

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 27/135

Representation

 – Generative / discriminative / hybrid

 – Appearance only orlocation andappearance

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 28/135

Representation

 – Generative / discriminative / hybrid

 – Appearance only orlocation andappearance

 – Invariances• View point

• Illumination

• Occlusion• Scale

• Deformation

• Clutter• etc.

R i

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 29/135

Representation

 – Generative / discriminative / hybrid

 – Appearance only orlocation andappearance

 – invariances

 – Part-based or global

w/sub-window

R t ti

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 30/135

Representation

 – Generative / discriminative / hybrid

 – Appearance only orlocation andappearance

 – invariances

 – Parts or global w/sub-

window – Use set of features or

each pixel in image

L i

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 31/135

 – Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning

Learning

L i

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 32/135

 – Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)

 – Methods of training: generative vs.discriminative

Learning

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

  c   l  a  s  s

   d  e  n

  s   i   t   i  e  s

 p( x|C 1)

 p( x|C 2)

 x0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

  p  o  s   t  e  r   i  o  r  p  r  o   b  a   b   i   l   i   t   i  e  s

 x

 p(C 1| x) p(C 

2| x)

L i

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 33/135

 – Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)

 – What are you maximizing? Likelihood(Gen.) or performances on train/validationset (Disc.)

 – Level of supervision• Manual segmentation; bounding box; image

labels; noisy labels

Learning

Contains a motorbike

L i

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 34/135

 – Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)

 – What are you maximizing? Likelihood(Gen.) or performances on train/validationset (Disc.)

 – Level of supervision• Manual segmentation; bounding box; image

labels; noisy labels

 – Batch/incremental (on category and imagelevel; user-feedback )

Learning

Learning

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 35/135

 – Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)

 – What are you maximizing? Likelihood(Gen.) or performances on train/validationset (Disc.)

 – Level of supervision• Manual segmentation; bounding box; image

labels; noisy labels

 – Batch/incremental (on category and imagelevel; user-feedback )

 – Training images:• Issue of overfitting

• Negative images for discriminative methods

Learning

Learning

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 36/135

 – Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)

 – What are you maximizing? Likelihood(Gen.) or performances on train/validationset (Disc.)

 – Level of supervision• Manual segmentation; bounding box; image

labels; noisy labels

 – Batch/incremental (on category and imagelevel; user-feedback )

 – Training images:• Issue of overfitting

• Negative images for discriminative methods

 – Priors

Learning

Recognition

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 37/135

 – Scale / orientation range to search over

 – Speed

Recognition

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 38/135

Bag-of-words models

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 39/135

ObjectObject Bag of Bag of ‘‘wordswords’’

Analogy to documentsAnalogy to documents

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 40/135

Analogy to documentsAnalogy to documents

Of all the sensory impressions proceeding tothe brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on themessages that reach the brain from our eyes.For a long time it was thought that the retinal

image was transmitted point by point to visualcenters in the brain; the cerebral cortex was amovie screen, so to speak, upon which theimage in the eye was projected. Through thediscoveries of Hubel and Wiesel we now

know that behind the origin of the visualperception in the brain there is a considerablymore complicated course of events. Byfollowing the visual impulses along their pathto the various cell layers of the optical cortex,Hubel and Wiesel have been able to

demonstrate that the message about the image falling on the retina undergoes a step- 

wise analysis in a system of nerve cells 

stored in columns. In this system each cell 

has its specific function and is responsible for 

a specific detail in the pattern of the retinal 

image.

sensory, brain,visual, perception,

retinal, cerebral cortex,

eye, cell, optical

nerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefoldincrease on 2004's $32bn. The CommerceMinistry said the surplus would be created bya predicted 30% jump in exports to $750bn,compared with a 18% rise in imports to

$660bn. The figures are likely to furtherannoy the US, which has long argued thatChina's exports are unfairly helped by adeliberately undervalued yuan. Beijingagrees the surplus is too high, but says the

yuan is only one factor. Bank of Chinagovernor Zhou Xiaochuan said the countryalso needed to do more to boost domesticdemand so more goods stayed within thecountry. China increased the value of theyuan against the dollar by 2.1% in July and

permitted it to trade within a narrow band, butthe US wants the yuan to be allowed to tradefreely. However, Beijing has made it clear thatit will take its time and tread carefully beforeallowing the yuan to rise further in value.

China, trade,surplus, commerce,

exports, imports, US,

yuan, bank, domestic,

foreign, increase,trade, value

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 41/135

learninglearning recognitionrecognition

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 42/135

categorycategory

decisiondecision

feature detection

& representation

codewords dictionarycodewords dictionary

image representation

category modelscategory models

(and/or) classifiers(and/or) classifiers

RepresentationRepresentation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 43/135

feature detection

& representation

codewords dictionarycodewords dictionary

image representation

1.1.

2.2.

3.3.

1.Feature detection and representation1.Feature detection and representation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 44/135

eatu e detect o a d ep ese tat op

1.Feature1.Feature detectiondetection andand representationrepresentation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 45/135

pp

Normalize

patch

Detect patches

[Mikojaczyk and Schmid ’02]

[Matas et al. ’02]

[Sivic et al. ’03]

Compute

SIFT

descriptor 

[Lowe’99]

Slide credit: Josef Sivic

1.Feature1.Feature detectiondetection andand representationrepresentation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 46/135

pp

2. Codewords dictionary formation2. Codewords dictionary formation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 47/135

yy

2. Codewords dictionary formation2. Codewords dictionary formation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 48/135

y

Vector quantization

Slide credit: Josef Sivic

2. Codewords dictionary formation2. Codewords dictionary formation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 49/135

y

Fei-Fei et al. 2005

3. Image representation3. Image representation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 50/135

…..

     f    r    e    q    u    e

    n    c    y

codewords

RepresentationRepresentation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 51/135

feature detection

& representation

codewords dictionarycodewords dictionary

image representation

1.1.

2.2.

3.3.

Learning and RecognitionLearning and Recognition

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 52/135

categorycategory

decisiondecision

codewords dictionarycodewords dictionary

category modelscategory models

(and/or) classifiers(and/or) classifiers

2 case studies2 case studies

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 53/135

1. Naïve Bayes classifier

 – Csurka et al. 2004

2. Hierarchical Bayesian text models(pLSA and LDA)

 – Background: Hoffman 2001, Blei et al. 2004 – Object categorization: Sivic et al. 2005, Sudderth et

al. 2005

 – Natural scene categorization: Fei-Fei et al. 2005

First, some notationsFirst, some notations

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 54/135

• wn: each patch in an image

 – wn = [0,0,…1,…,0,0]T

• w: a collection of all N patches in an image

 – w = [w1,w2,…,wN]

• d j: the jth image in an image collection

• c: category of the image

• z: theme or topic of the patch

Case #1: the NaCase #1: the Na ï  ï veve BayesBayes modelmodel

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 55/135

w

N

c

)|()( cw pc p

Prior prob. ofthe object classes

Image likelihoodgiven the class

Csurka et al. 2004

∏==

 N 

n

n cw pc p1

)|()(

Object classdecision

∝)|( wc pc

c maxarg=∗

Case #2: Hierarchical BayesianCase #2: Hierarchical Bayesian

text modelstext models

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 56/135

Hoffman, 2001

text modelstext models

wN

d z

D

w

N

c z

D

π

Blei et al., 2001

Probabilistic Latent Semantic Analysis (pLSA)

Latent Dirichlet Allocation (LDA)

Case #2: Hierarchical BayesianCase #2: Hierarchical Bayesiantext modelstext models

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 57/135

wN

d z

D

text modelstext models

Probabilistic Latent Semantic Analysis (pLSA)

“face”

Sivic et al. ICCV 2005

Case #2: Hierarchical BayesianCase #2: Hierarchical Bayesiantext modelstext models

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 58/135

w

N

c z

D

π

Latent Dirichlet Allocation (LDA)

text modelstext models

Fei-Fei et al. ICCV 2005

“beach”

Another application

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 59/135

Another application

• Human action classification

Invariance issuesInvariance issues

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 60/135

• Scale and rotation – Implicit

 – Detectors and descriptors

Kadir and Brady. 2003

Invariance issuesInvariance issues

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 61/135

• Scale and rotation• Occlusion

 – Implicit in the models

 – Codeword distribution: small variations

 – (In theory) Theme (z) distribution: different

occlusion patterns

Invariance issuesInvariance issues

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 62/135

• Scale and rotation• Occlusion

• Translation – Encode (relative) location information

Sudderth et al. 2005

Invariance issuesInvariance issues

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 63/135

• Scale and rotation• Occlusion

• Translation• View point (in theory)

 – Codewords: detectorand descriptor

 – Theme distributions:

different view points

Fergus et al. 2005

Model propertiesModel properties

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 64/135

Of all the sensory impressions proceeding tothe brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on themessages that reach the brain from our eyes.

For a long time it was thought that the retinalimage was transmitted point by point to visualcenters in the brain; the cerebral cortex was amovie screen, so to speak, upon which theimage in the eye was projected. Through thediscoveries of Hubel and Wiesel we now

know that behind the origin of the visualperception in the brain there is a considerablymore complicated course of events. Byfollowing the visual impulses along their pathto the various cell layers of the optical cortex,Hubel and Wiesel have been able to

demonstrate that the message about the image falling on the retina undergoes a step- 

wise analysis in a system of nerve cells 

stored in columns. In this system each cell 

has its specific function and is responsible for 

a specific detail in the pattern of the retinal image.

sensory, brain,

visual, perception,

retinal, cerebral cortex,

eye, cell, optical

nerve, image

Hubel, Wiesel

• Intuitive

 – Analogy to documents

Model propertiesModel properties

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 65/135

• Intuitive

• (Could use)generative models

 – Convenient for weakly-

or un-supervisedtraining

 – Prior information

 – Hierarchical BayesianframeworkSivic et al., 2005, Sudderth et al., 2005

Model propertiesModel properties

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 66/135

• Intuitive

• (Could use)generative models

• Learning and

recognition relativelyfast

 – Compare to other

methods

Weakness of the modelWeakness of the model

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 67/135

• No rigorous geometric informationof the object components

• It’s intuitive to most of us thatobjects are made of parts – no

such information• Not extensively tested yet for – View point invariance

 – Scale invariance

• Segmentation and localization

unclear

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 68/135

part-based models

Slides courtesy to Rob Fergus for “part-based models”

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 69/135

One-shot learning

of object categoriesFei-Fei et al. ‘03, ‘04, ‘06 

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 70/135

One-shot learning

of object categories

P. Brueg e l , 15 62 

Fei-Fei et al. ‘03, ‘04, ‘06 

model representation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 71/135

One-shot learning

of object categoriesFei-Fei et al. ‘03, ‘04, ‘06 

X (location)

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 72/135

(x,y) coords. of region center

A (appearance)

Projection ontoPCA basis

c1

c2

c10

…..

normalize

 1 1 x 1 1 

 p a t c h

X (location)The Generative Model

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 73/135

(x,y) coords. of region center

A (appearance)

Projection ontoPCA basis

c1

c2

c10

…..

normalize

 1 1 x 1 1 

 p a t c h

X A

h

ΓXμX

I

ΓAμA

The Generative Model

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 74/135

X A

h

ΓXμX

I

ΓAμA

observed variables

hidden variable

parameters

The Generative Model

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 75/135

X A

h

ΓXμX

I

ΓAμA

θn

θ1

θ2

where θ = {µX, ΓX, µA, ΓA}

ML/MAP

Weber et al. ’98 ’00, Fergus et al. ’03 

The Generative Model

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 76/135

θn

θ1

θ2

where θ = {µX, ΓX, µA, ΓA}

ML/MAPΓXμX

ΓAμA

shape model

appearance model

The Generative Model

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 77/135

X A

h

I

ΓXμX ΓAμA

θn

θ1

θ2

ML/MAP

The Generative Model

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 78/135

X A

h

I

P

ΓXμX ΓAμA

a0X

B0X

m0X

β0X

a0A

B0A

m0A

β0A

Bayesian

Parameters to estimate: {mX, βX, aX, BX, mA, βA, aA, BA}i.e. parameters of Normal-Wishart distribution

θn

θ1

θ2

Fei-Fei et al. ‘03, ‘04, ‘06 

The Generative Model

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 79/135

X A

h

I

P

ΓXμX ΓAμA

a0X

B0X

m0X

β0X

a0A

B0A

m0A

β0A

parameters

priors

Fei-Fei et al. ‘03, ‘04, ‘06 

The Generative Model

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 80/135

X A

h

I

P

ΓXμX ΓAμA

a0X

B0X

m0X

β0X

a0A

B0A

m0A

β0A

Prior distribution

Fei-Fei et al. ‘03, ‘04, ‘06 

priors

1. human vision

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 81/135

2. modelrepresentation

3. learning& inferences

4. evaluation

& dataset& application

One-shot learning

of object categories

learning & inferences

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 82/135

One-shot learning

of object categoriesFei-Fei et al. 2003, 2004, 2006 

No labeling No segmentation No alignment

learning & inferences

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 83/135

One-shot learning

of object categoriesFei-Fei et al. 2003, 2004, 2006 

Bayesian

θn

θ1

θ2

RandominitializationVariational EMVariational EM

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 84/135

E-Step

new estimate

of p(θ|train)

M-Step

prior knowledge of p(θ)Attias, Jordan, Hinton etc.

evaluation & dataset

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 85/135

One-shot learning

of object categoriesFei-Fei et al. 2004, 2006a, 2006b 

evaluation & dataset -- Caltech 101 Dataset

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 86/135

One-shot learning

of object categoriesFei-Fei et al. 2004, 2006a, 2006b 

evaluation & dataset -- Caltech 101 Dataset

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 87/135

One-shot learning

of object categoriesFei-Fei et al. 2004, 2006a, 2006b 

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 88/135

Part 3: discriminative methods

Discriminative methodsObject detection and recognition is formulated as a classification problem.

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 89/135

Bag of image patches

Decision

boundary

… and a decision is taken at each window about if it contains a target object or not.

Computer screen

Background

In some feature space

Where are the screens?

The image is partitioned into a set of overlapping windows

Discriminative vs. generativeGenerati e model

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 90/135

(The lousy painter)

0 10 20 30 40 50 60 70

0

0.05

0.1

x = data

• Generative model

0 10 20 30 40 50 60 700

0.5

1

x = data

• Discriminative model

0 10 20 30 40 50 60 70 80

-1

1

x = data

• Classification function

(The artist )

Discriminative methods

Nearest neighbor Neural networks

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 91/135

106

examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005…

Neural networks

LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…

Conditional Random FieldsSupport Vector Machines and Kernels

McCallum, Freitag, Pereira 2000Kumar, Hebert 2003

Guyon, Vapnik

Heisele, Serre, Poggio, 2001

F l ti bi l ifi ti

Formulation

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 92/135

• Formulation: binary classification

+1-1

x1 x2 x3 xN

… xN+1 xN+2 xN+M

-1 -1 ? ? ?

Training data: each image patch is labeledas containing the object or background

Test data

Features x =

Labels y =

Where belongs to some family of functions

• Classification function

• Minimize misclassification error(Not that simple: we need some guarantees that there will be generalization)

Overview of section

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 93/135

• Object detection with classifiers

• Boosting

 – Gentle boosting

 – Weak detectors – Object model

 – Object detection

• Multiclass object detection

A i l l ith f l i b t l ifi

Why boosting?

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 94/135

• A simple algorithm for learning robust classifiers – Freund & Shapire, 1995 – Friedman, Hastie, Tibshhirani, 1998

• Provides efficient algorithm for sparse visualfeature selection – Tieu & Viola, 2000  – Viola & Jones, 2003 

• Easy to implement, not requires externaloptimization tools.

D fi l ifi i dditi d l

Boosting

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 95/135

• Defines a classifier using an additive model:

Strongclassifier

Weak classifier

WeightFeaturesvector

D fi l ifi i dditi d l

Boosting

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 96/135

• Defines a classifier using an additive model:

• We need to define a family of weak classifiers

Strongclassifier

Weak classifier

WeightFeaturesvector

from a family of weak classifiers

Boosting

• It is a sequential procedure:

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 97/135

Each data point has

a class label:

wt =1

and a weight:

+1 ( )

-1 ( )yt =

• It is a sequential procedure:

xt=1

xt=2

xt

Toy exampleWeak learners from the family of lines

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 98/135

Weak learners from the family of lines

h => p(error) = 0.5 it is at chance

Each data point has

a class label:

wt =1

and a weight:

+1 ( )

-1 ( )yt =

Toy example

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 99/135

This one seems to be the best

Each data point has

a class label:

wt =1

and a weight:

+1 ( )

-1 ( )yt =

This is a ‘weak classifier ’: It performs slightly better than chance.

Toy example

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 100/135

We set a new problem for which the previous weak classifier performs at chance again

Each data point has

a class label:

wt wt exp{-yt Ht}

We update the weights:

+1 ( )

-1 ( )yt =

Toy example

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 101/135

We set a new problem for which the previous weak classifier performs at chance again

Each data point has

a class label:

wt wt exp{-yt Ht}

We update the weights:

+1 ( )

-1 ( )yt =

Toy example

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 102/135

We set a new problem for which the previous weak classifier performs at chance again

Each data point has

a class label:

wt wt exp{-yt Ht}

We update the weights:

+1 ( )

-1 ( )yt =

Toy example

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 103/135

We set a new problem for which the previous weak classifier performs at chance again

Each data point has

a class label:

wt wt exp{-yt Ht}

We update the weights:

+1 ( )

-1 ( )yt =

Toy example

f1 f2

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 104/135

The strong (non- linear) classifier is built as the combination ofall the weak (linear) classifiers.

f3

f4

From images to features:Weak detectors

W ill d fi f il f i l

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 105/135

We will now define a family of visualfeatures that can be used as weak

classifiers (“weak detectors”)

Takes image as input and the output is binary response.The output is a weak detector.

Weak detectors

Textures of textures

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 106/135

Textures of texturesTieu and Viola, CVPR 2000

Every combination of three filtersgenerates a different feature

This gives thousands of features. Boosting selects a sparse subset, so computationson test time are very efficient. Boosting also avoids overfitting to some extend.

Weak detectors

Haar filters and integral image

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 107/135

Haar filters and integral imageViola and Jones, ICCV 2001

The average intensity in theblock is computed with foursums independently of theblock size.

Weak detectors

Other weak detectors:

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 108/135

Other weak detectors:

• Carmichael, Hebert 2004

• Yuille, Snow, Nitzbert, 1998• Amit, Geman 1998

• Papageorgiou, Poggio, 2000

• Heisele, Serre, Poggio, 2001

• Agarwal, Awan, Roth, 2004

• Schneiderman, Kanade 2004• …

Weak detectors

Part based: similar to part-based generative

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 109/135

p gmodels. We create weak detectors by

using parts and voting for the object centerlocation

Car model Screen model

These features are used for the detector on the course web site.

Weak detectors

First we collect a set of part templates from a set of trainingobjects

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 110/135

objects.

Vidal-Naquet, Ullman (2003)

Weak detectors

We now define a family of “weak detectors” as:

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 111/135

= =

Better than chance

*

Weak detectors

We can do a better job using filtered images

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 112/135

Still a weak detectorbut better than before

* *= ==

TrainingFirst we evaluate all the N features on all the training images.

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 113/135

Then, we sample the feature outputs on the object center and at randomlocations in the background:

Representation and object model

Selected features for the screen detector

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 114/135

4 101 2 3

100

Lousy painter 

Representation and object modelSelected features for the car detector

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 115/135

1 2 3 4 10 100

… …

Overview of section

• Object detection with classifiers

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 116/135

Object detection with classifiers

• Boosting – Gentle boosting

 – Weak detectors – Object model

 – Object detection

• Multiclass object detection

Example: screen detectionFeatureoutput

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 117/135

Example: screen detectionFeatureoutput

Thresholdedoutput

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 118/135

Weak ‘detector’

Produces many false alarms.

Example: screen detectionFeatureoutput

Thresholdedoutput

Strong classifierat iteration 1

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 119/135

Example: screen detectionFeatureoutput

Thresholdedoutput

Strongclassifier

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 120/135

Second weak ‘detector’

Produces a different set offalse alarms.

Example: screen detectionFeatureoutput

Thresholdedoutput

Strongclassifier

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 121/135

+

Strong classifierat iteration 2

Example: screen detectionFeatureoutput

Thresholdedoutput

Strongclassifier

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 122/135

+

 …

Strong classifierat iteration 10

Example: screen detectionFeatureoutput

Thresholdedoutput

Strongclassifier

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 123/135

+

 …

Addingfeatures

Final

classification

Strong classifierat iteration 200

applications

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 124/135

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 125/135

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 126/135

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 127/135

Document Analysis

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 128/135

Digit recognition, AT&T labshttp://www.research.att.com/~yann / 

Medical Imaging

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 129/135

Robotics

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 130/135

Toys and robots

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 131/135

Finger prints

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 132/135

Surveillance

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 133/135

Security

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 134/135

Searching the web

8/2/2019 slides: Machine Learning for Computer Vision

http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 135/135

Recommended