Visual Object Recognition

Visual Object Recognition

Rob FergusCourant Institute,

New York University

http://cs.nyu.edu/~fergus/icml_tutorial/

AgendaAgenda• Introduction

• Bag-of-words models

• Visual words with spatial location

• Part-based models

• Discriminative methods

• Segmentation and recognition

• Recognition-based image retrieval

• Datasets & Conclusions

Li Fei-Fei, PrincetonRob Fergus, NYU

Antonio Torralba, MIT

Recognizing and Learning Recognizing and Learning Object Categories: Year 2007Object Categories: Year 2007

http://people.csail.mit.edu/torralba/shortCourseRLOC

AgendaAgenda• Introduction

• Bag-of-words models

• Visual words with spatial location

• Part-based models

• Discriminative methods

• Segmentation and recognition

• Recognition-based image retrieval

• Datasets & Conclusions

So what does object recognition involve?

Classification: are there street-lights in the image?

Detection: localize the street-lights in the image

Object categorization

mountain

buildingtree

banner

vendorpeople

street lamp

Scene and context categorization

• outdoor• city• …

Application: Assisted driving

meters

met

ers

Ped

Ped

Car

Lane detection

Pedestrian and car detection

• Collision warning systems with adaptive cruise control, • Lane departure warning systems, • Rear object detection systems,

Application:Computational photography

Application: Improving online search

Query:STREET

Organizing photo collections

Challenges 1: view point variation

Michelangelo 1475-1564

Challenges 2: scale

Challenges 3: illumination

slide credit: S. Ullman

Challenges 4: background clutter

Bruegel, 1564

Challenges 5: occlusion

http://lh5.ggpht.com/_wJc6t2hDl2M/RrL7Gh6sS7I/AAAAAAAAAYY/n3xaHc2opls/DSC00633.JPG

Challenges 6: deformation

http://img.timeinc.net/time/asia/magazine/2007/1112/racehorse_1112.jpg

History: single object recognition

Object 1 Object 2

Object 3

David Lowe [1985]

Single object recognition history: Geometric methods

Rothwell et al. [1992]

Single object recognition history: Appearance-based methods

• Murase & Nayer 1995 • Schmid & Mohr 1997• Lowe, et al. 1999, 2003• Mahamud and Herbert, 2000• Ferrari et al. 2004• Rothganger et al. 2004• Moreels and Perona, 2005• …

Instance 1 Instance 2

Instance 3

Challenges 7: intra-class variation

Shoe class

History: early object categorization

• Fischler, Elschlager, 1973 • Turk and Pentland, 1991• Belhumeur, Hespanha, & Kriegman, 1997• Rowley & Kanade, 1998• Schneiderman & Kanade 2004• Viola and Jones, 2000• Heisele et al., 2001

• Amit and Geman, 1999• LeCun et al. 1998• Belongie and Malik, 2002• DeCoste and Scholkopf, 2002• Simard et al. 2003

• Poggio et al. 1993• Argawal and Roth, 2002• Schneiderman & Kanade, 2004…..

Three main issuesThree main issues

• Representation– How to represent an object category

• Learning– How to form the classifier, given training data

• Recognition– How the classifier is to be used on novel data

Representation– Generative /

discriminative / hybrid


discriminative / hybrid– Appearance only or

location and appearance




– Invariances• View point• Illumination• Occlusion• Scale• Deformation• Clutter• etc.




– Invariances– Part-based or global with sub-window




– Invariances– Parts or global w/sub-

window– Use set of features or

each pixel in image

– Unclear how to model categories, so learn rather than manually specify

Learning


– Methods of training: generative vs. discriminative

Learning



– Level of supervision• Manual segmentation; bounding box; image labels; noisy labels

Learning

Contains a motorbike

Learning– Unclear how to model categories, so learn rather than manually specify



-- Training images:Issue of over-fitting (typically limited training data)Negative images for discriminative methods

Learning– Unclear how to model categories, so learn rather than manually specify



-- Training images:Issue of over-fitting (typically limited training data)Negative images for discriminative methods

-- Priors

– Scale / orientation range to search over – Speed– Context

Recognition

Hoi

em, E

fros,

Her

bert,

200

6

– Context enables pruning of detector output

Recognition

Documents

Visual Object Recognition