On Visual Recognition

Computer Vision GroupUniversity of California Berkeley

Jitendra Malik UC Berkeley

From Pixels to Perception

TigerGrass

outdoorwildlife

shadow

Object Category Recognition

Defining Categories

• What is a “visual category”?– Not semantic

– Working hypothesis: Two instances of the same category must have “correspondence” (i.e. one can be morphed into the other)

• e.g. Four-legged animals

– Biederman’s estimate of 30,000 basic visual categories

Facts from Biological Vision

• Timing

• Abstraction/Generalization

• Taxonomy and Partonomy

Detection can be very fast

• On a task of judging animal vs no animal, humans can make mostly correct saccades in 150 ms (Kirchner & Thorpe, 2006)

– Comparable to synaptic delay in the retina, LGN, V1, V2, V4, IT pathway.

– Doesn’t rule out feed back but shows feed forward only is very powerful

102030405060708090

0 50 100 150 200

exposure [ms]

accuracy

(corrected for guessing)

detection: obj vs. texture

categorization: car vs. obj

identification: jeep vs. car

As Soon as You Know It Is There, You Know What It Is

Grill-Spector & Kanwisher, Psychological Science, 2005

Abstraction/Generalization

• Configurations of oriented contours

• Considerable toleration for small deformations

Attneave’s Cat (1954)Line drawings convey most of the information

Taxonomy and Partonomy

• Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia– Recognition can be at multiple levels of categorization, or be identification

at the level of specific individuals , as in faces.

• Partonomy: Objects have parts, they have subparts and so on. The human body contains the head, which in turn contains the eyes.

• These notions apply equally well to scenes and to activities.

• Psychologists have argued that there is a “basic-level” at which categorization is fastest (Eleanor Rosch et al).

• In a partonomy each level contributes useful information fro recognition.

Matching with Exemplars

• Use exemplars as templates

• Correspond features between query and exemplar

• Evaluate similarity score

Database of

Templates

Matching with Exemplars

• Use exemplars as templates

• Correspond features between query and exemplar

• Evaluate similarity score

Database of

Templates

Best matching template is a helicopter

3D objects using multiple 2D views

View selection algorithm from Belongie, Malik & Puzicha (2001)

Error vs. Number of Views

Three Big Ideas

• Correspondence based on local shape/appearance descriptors

• Deformable Template Matching

• Machine learning for finding discriminative features

Three Big Ideas

Comparing Pointsets

Shape ContextCount the number of points inside each bin, e.g.:

Count = 4

Count = 10

Compact representation of distribution of points relative to each point

(Belongie, Malik & Puzicha, 2001)

Shape Context

Geometric Blur(Local Appearance Descriptor)

Geometric Blur Descriptor

Compute sparse

channels from image

Extract a patch

in each channel

Apply spatially varying

blur and sub-sample

(Idealized signal)

Descriptor is robust to small affine distortions

Berg & Malik '01

Three Big Ideas

Modeling shape variation in a category

• D’Arcy Thompson: On Growth and Form, 1917– studied transformations between shapes of organisms

MatchingExample

model target

Handwritten Digit Recognition

• MNIST 60 000: – linear: 12.0%

– 40 PCA+ quad: 3.3%

– 1000 RBF +linear: 3.6%

– K-NN: 5%

– K-NN (deskewed): 2.4%

– K-NN (tangent dist.): 1.1%

– SVM: 1.1%

– LeNet 5: 0.95%

• MNIST 600 000 (distortions): – LeNet 5: 0.8%– SVM: 0.8%– Boosted LeNet 4: 0.7%

• MNIST 20 000: – K-NN, Shape Context

matching: 0.63%

EZ-Gimpy Results

• 171 of 192 images correctly identified: 92 %

canvas

Three Big Ideas

Discriminative learning(Frome, Singer, Malik, 2006)

weights on patch features in training images

distance functions from training images to any other images

browsing, retrieval, classification

83/40079/400

triplets

•learn from relative similarity

image iimage j image k

image-to-image distances based on feature-to-image distances

compare image-to-image distances

focal image version

image i (focal)

image j

image k

0.8 0.2

0.3 0.4

=xijk...

0.5 -0.2

large-margin formulation

slack variables like soft-margin SVMw constrained to be positiveL2 regularization

Caltech-101 [Fei-Fei et al. 04]

• 102 classes, 31-300 images/class

retrieval examplequery image

retrieval results:

Caltech 101 classification results

(see Manik Verma’s talks for the best yet..)

15 training/class, 63.2%

Conclusion

• Integrating Perceptual Organization and Recognition

On Visual Recognition

Documents

Opencv visual recognition

AAAI08 tutorial: visual object recognition

Annotator Rationales for Visual Recognition

Visual word recognition rules vs. pattern recognition and memory retrieval

Visual Recognition using Embedded Feature Selection for …papers.nips.cc/paper/4799-visual-recognition-using-embedded-featur… · Visual Recognition using Embedded Feature Selection

Neural Systems for Visual Scene Recognition · 2015. 1. 5. · Neural Systems for Visual Scene Recognition 107 scene recognition model that operated on seven global properties: openness,

All You Should Know on Visual Recognition Pipelines: from ...wiki.icub.org/images/c/c4/Visual_Recognition_Pipelines.pdf · All You Should Know on Visual Recognition Pipelines: from

Visual Recognition: The Big Picture

Visual Recognition: Image Formation

Visual Speech Recognition Based on Lip Movement … · Visual Speech Recognition Based on Lip ... considering canny edge detection algorithm for ROI extraction ... the more effective

Visual recognition

A Visual Recognition Model Based on Hierarchical Feature

target acquisition through visual recognition

Article The Effect of MoodContext on Visual Recognition ...clok.uclan.ac.uk/4243/3/4243_Robinson_Rollings.pdf · The Effect of Mood-Context on Visual Recognition and Recall Memory

Large-scale visual recognition - Inria

CS 3710: Visual Recognition Recognition Basics

Continuous Audio-Visual Speech Recognition

The impact of typical and atypical language dominance on visual word recognition

Visual Learning and Recognition

Visual pattern recognition in robotics