38
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features Learning & Recognition Segmentation & Recognition

Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 1

Object RecognitionObject Recognition

Outline:

• Introduction

• Representation: Concept

• Representation: Features

• Learning & Recognition

• Segmentation & Recognition

Page 2: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 2

Credits: major sources of material, including figures and slides were:

• Riesenhuber & Poggio, Hierarchical models of object recognition in cortex. Nature Neuroscience, 1991.

• B. Mel. SeeMore. Neural Computation, 1997.

• Ullman, Vidal-Naquet, Sari. Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 2002.

• David G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. of Computer Vision, 2004.

• and various resources on the WWW

Page 3: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 3

Why is it difficult?Why is it difficult?

• position/pose/scale• lighting/shadows

• articulation/expression• partial occlusion

Because appearance drastically varies with:

need invariant recognition!

Page 4: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 4

The “Classical View”The “Classical View”Historically:

Segmentation

Feature Extractio

n

Recognition

Problem:Bottom-up segmentation only works in very limited range of situations! This architecture is fundamentally flawed!

Image

Two ways out: 1) “direct” recognition, 2) integration of seg.&rec.

Page 5: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 5

Ventral StreamVentral Stream

→ larger RFs, higher “complexity”, higher invariance →

V1 V2 V4 IT

edges, bars objects, faces

D.vanEssen (V2) K.Tanaka (IT)

Page 6: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 6

Basic ModelsBasic Models

seminal work by Fukushima, newer version by Riesenhuber and Poggio

Page 7: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 7

QuestionsQuestions• what are the intermediate features?

• how/why are they being learned?

• how is invariance computation implemented?• what nonlinearities; at what level (dendrites?)

• how is invariance learned?• temporal continuity; role of eye movements

• basic model is feedforward, what do feedback connections do?• attention/segmentation/bayesian inference?

Page 8: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 8

Representation: ConceptRepresentation: Concept• 3-d models: won’t talk about

• view-based:

• holistic descriptions of a view

• invariant features/histogram techniques

• spatial constellation of localized features

Page 9: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 9

Holistic Descriptions I:Holistic Descriptions I:TemplatesTemplates

Idea:• compare image (regions) directly to template• image patches, object template are represented as

high-dimensional vectors• simple comparison metrics (Euclidean distance,

normalized correlation, ...)

Problem:• such metrics not robust w.r.t. even small changes in

position/aspect/scale changes or deformations difficult to achieve invariance

Page 10: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 10

Holistic Descriptions II:Holistic Descriptions II:Eigenspace ApproachEigenspace Approach

Somewhat better: “Eigenspace” approaches• perform Principal Component Analysis (PCA) on

training images (e.g. “Eigenfaces”• compare images by projecting on subset of the PCs

Turk&Pentland (1992)Murase&Nayar (1995)

Page 11: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 11

AssessmentAssessment

• quite successful for segmented and carefully aligned images (e.g., eyes and nose are at the same pixel coordinates in all images)

• but similar problems as above:• not well-suited for clutter• problems with occlusions• some notable extensions trying to deal with this

(e.g., Leonardis, 1996,1997)

Page 12: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 12

Feature HistogramsFeature HistogramsIdea: reach invariance by computing invariant featuresExamples: Mel (1997), Schiele&Crowley (1997,2000)

histogram pooling: throw occurrences of simple feature from all image regions together into one “bin”

Page 13: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 13

Assessment:• works very well for segmented images with• only one object, but...

Problem:• histograms of simple features over the whole image

leads to a “superposition catastrophe”, lacks a “binding” mechanism

• consider several objects in scene: histogram contains all their features; no representation of which features came from same object

• system breaks down for clutter or complex backgrounds

Page 14: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 14

B.

Mel (1

99

7)

Page 15: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 15

Training and test images, performance:

A B C D E

Page 16: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 16

Feature ConstellationsFeature Constellations

Elastic Matching Techniques:Fischler&Elschlager (1973), Lades et.al. (1993)

Tremendously successful for:• face finding/recognition• object recognition• gesture recognition• cluttered scene analysis

“Elastic Graph Matching”(EGM)

Observation:holistic templates and histogram techniques can´t handle cluttered scenes well

Idea:How about constellations of features?E.g. face is constellation of eyes, nose, mouth, etc.

Page 17: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 17

Representation: Representation: FeaturesFeatures

Only discuss local features:

• image patches

• wavelet basis, e.g., Haar, Gabor

• complex features, e.g., SIFT (= Scale Invariant Feature Transform)

Page 18: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 18

Image PatchesImage Patches

likelihood ratio:

“merit”:

weight:

Ullm

an

, V

idal-

Naq

uet,

Sali

(20

02

)

Page 19: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 19

Intermediate complexity is best: (trivial result, really)

Page 20: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 20

Recognition examples:

Page 21: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 21

Gabor WaveletsGabor Wavelets

image space frequency space

• in frequency space Gabor wavelet is a Gaussian• “wavelet”: different wavelets are scaled/rotated versions of a mother wavelet

Page 22: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 22

Gabor Wavelets as Gabor Wavelets as filtersfilters

Gabor filters: sin() and cos() part

compute correlation of image withfilter at every location x0:

Page 23: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 23

Tiling of frequency space: Tiling of frequency space: JetsJets

measured frequency tuning of biological neurons (left) and dense coverage

applying different Gabor filters (with different k) to sameimage location gives vector of filter responses: Jet

Page 24: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 24

SIFT FeaturesSIFT Features• step 1: find scale space extrema

Page 25: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 25

• step 2: apply contrast and curvature requirements

Page 26: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 26

• step 3: local image descriptor extracted at key points is a 128-dim vector

Page 27: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 27

Learning and Learning and RecognitionRecognition

• top-down model matching• Elastic graph matching

• bottom-up indexing• with or without shared features

Page 28: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 28

Elastic Graph Matching Elastic Graph Matching (EGM)(EGM)

“view based”: need differentgraphs for different views

Representation:graph nodes labelled with Jets (Gabor filterresponses of different scales/orientations)

Matching:Minimize cost function that punishesdissimilarities of Gabor responses anddistortions of the graph through stochasticoptimization techniques

Page 29: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 29

Bunch GraphsBunch GraphsIdea: add invariance by labelling graph nodes with collectionor bunch of different feature exemplars (Wiskott et.al.,1995, 1997)

Advantage: can decouple finding the facial features from the identification

Matching uses a MAX rule.

Page 30: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 30

Indexing MethodsIndexing Methods

• when you want to recognize very many objects, it’s inefficient to individually check for each model by searching for all of its features in a top-down fashion

• better: indexing methods• also: share features among object models

Page 31: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 31

Recognition with SIFT Recognition with SIFT featuresfeatures

• recognition: extract SIFT features; match to nearest neighbor in data base of stored features; use Hough transform to pool votes

Page 32: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 32

Recognition with Gabor Jets Recognition with Gabor Jets and Color Featuresand Color Features

Page 33: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 33

Scaling Behavior when Scaling Behavior when Sharing Features between Sharing Features between

modelsmodels

• Recognition speed limited more by number of features rather than number of object models, modest number of features o.k.• can incorporate many feature types• can incorporate stereo (reasoning about occlusions)

Page 34: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 34

Hierarchies of FeaturesHierarchies of Features

Long history of using hierarchies:Fukushima’s Neocognitron (1983),Nelson&Selinger (1998,1999):

Advantages using hierarchy:• faster learning and processing• better grip on correlated deformations• easier to find proper specificity vs. invariance tradeoff?

Page 35: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 35

Feature LearningFeature Learning• Unsupervised clustering: not necessarily optimal for

discrimination

• Use big bag of features, fish out the useful ones (e.g. via boosting: Viola, 1997): takes very long to train, since you have to consider every feature from that big bag

• Note: usefulness of one feature depends on the which other ones you’re using already.

• Learn higher level features as (nonlinear) combinations of lower level features (Perona et.al., 2000): also takes very long to train, only up to 5 features. But could use locality constraint

Page 36: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 36

FeedbackFeedback

Question: Why all the feedback connections in the brain?Important for on-line processing?

Neuroscience: Object recognition in 150 ms (Thorpe et.al., 1996), but interesting temporal response properties of IT neurons (Oram&Richmond, 1999); some V1 neurons “restore” line behind an occluder

Idea: Feed-forward architecture: can’t correct errors made at early stages later on. Feedback architecture can!

“High level hypotheses try to reinforce their lower level evidence while hypotheses compete at all levels.”

Page 37: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 37

Recognition & SegmentationRecognition & Segmentation

• Basic Idea: integrate recognition with segmentation in a feedback architecture:

• object hypotheses reinforce their supporting evidence and inhibit competing evidence, suppressing features that do not belong to them (idea goes back to at least the PDP books)

• at the same time: restore missing features due to partial occlusion (associative memory property)

Page 38: Jochen Triesch, UC San Diego, triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 38

Current work in this areaCurrent work in this area

• mostly demonstrating how recognition can aid segmentation

• what is missing is a clear and elegant demonstration of a truly integrated system that shows how the two kinds of processing help each other

• Maybe don’t treat as two kinds of processing but one inference problem

• how best to do this? “million dollar question”