Object Recognition Vision Class 2006-7. Object Classes

Preview:

Citation preview

Object Recognition

Vision Class 2006-7

Object Classes

Individual Recognition

Brief History: Recognition

Mental Rotation

Three-point alignment

Huttenlocher D. & Ullman, S. Recognizing solid objects by alignment with

an image. Int. J. Computer Vision 5(3), 195 – 212, 1990.

Object Alignment

Given three model points P1, P2, P3, and three image points p1, p2, p3, there is a unique transformation (rotation, translation, scale)

that aligns the model with the image .

(SR + d)Pi = pi

Alignment -- comments

• The projection is orthographic projection (combined with scaling).

• The 3 points are required to be non-collinear.

• The transformation is determined up to a reflection of the points about the image plane and translation in depth.

Car Recognition

Car Models

Alignment: Cars

Alignment: Mismatch

Brief History: Classification

RBC

Structural Description

G2

G2

G4

G3

G1

G4

Above

Above

Right-of Left-of

Touch

Classification: Current Approaches

Visual Class: Similar Arrangement of Shared Components

Optimal Class Components?

• Large features are too rare

• Small features are found

everywhere

Find features that carry the highest amount of information

Entropy

Entropy: H = -Σp(xi) log2 p(xi)

x = 0 1 H p = 0.5 0.5 ?

0.1 0.9 0.47 0.01 0.99 0.08

Mutual information

H(C) when F=1 H(C) when F=0

I(C;F) = H(C) – H(C/F)

F=1 F=0

H(C)

))(()()( xPLogxPxH

Mutual Information I

X alone: p(x) = 0.5, 0.5 H = 1.0

X given Y: Y = 0 Y = 1

p(x) = 0.8, 0.2 H = 0.72

p(x) = 0.1, 0.9H = 0.47

H(X|Y) = 0.5*0.72 + 0.5*0.47 = 0.595

H(X) – H(X|Y) = 1 – 0.595 = 0.405

I(X,Y) = 0.405

Mutual Information II

yx ypxp

yxpyxpYXI

, )()(

),(log),(),(

Computing MI from Examples

• Mutual information can be measured from examples:

100 Faces 100 Non-faces

Feature: 44 times 6 times

Mutual information: 0.1525H(C) = 1, H(C|F) = 0.8475

Mutual Info vs. Threshold

0.00 20.00 40.00

Detection threshold

Mu

tu

al In

fo

forehead

hairline

mouth

eye

nose

nosebridge

long_hairline

chin

twoeyes

Fragments Selection

• For a set of training images:• Generate candidate fragments

– Measure p(F/C), p(F/NC)

• Compute mutual information• Select optimal fragment • After k fragments: Maximizing the minimal addition in mutual

information with respect to each of the first k fragments

Highly Informative Face Fragments

Horse-class features

Car-class features

Fragment ‘Weight’

)|(

)|()(

CFP

CFPFR

Likelihood ratio:

Weight of F:

))(()( FRLogFw

Decision:

∑wi Fi > θ

Combining fragments

kkFW

w1 wkw2

D1 D2Dk

Feature detection :

Within a region

S(F,I) > Threshold

Fragment-based Classification

Leibe, Schiele 2003

Fergus, Perona, Zisserman 2003

Agarwal, Roth 2002

Recognition: ROC Curves

Training & Test Images

• Frontal faces without distinctive features (K:496,W:385)• Minimize background by cropping• Training images for extraction: 32 for each class• Training images for evaluation: 100 for each class• Test images: 253 for Western and 364 for Korean

Training – Fragment Extraction

WesternFragment

Score 0.92 0.82 0.77 0.76 0.75 0.74 0.72 0.68 0.67 0.65

Weight 3.42 2.40 1.99 2.23 1.90 2.11 6.58 4.14 4.12 6.47

KoreanFragment

Score 0.92 0.82 0.77 0.76 0.75 0.74 0.72 0.68 0.67 0.65

Weight 3.42 2.40 1.99 2.23 1.90 2.11 6.58 4.14 4.12 6.47

Extracted Fragments

Classifying novel images

Westerner

Korean

Unknown

kF

wF

Detect FragmentsCompare

Summed WeightsDecision

)w()k( FWFW

)w()k( FWFW

)w()k( FWFW

50%

60%

70%

80%

90%

100%

1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100

Number of fragments

Co

rre

ct -

Err

or

(%)

Eastern test set Western test setEffect of Number of Fragments

• 7 fragments: 95%, 80 fragments: 100%• Inherent redundancy of the features• Slight violation of independence assumption

Harris Corner Detection

Ix2 IxIy

IxIy

Iy2

Harris Corner Operator

<Ix2> < IxIy<

< < yIxI < yI2>

H=

Averages within a neighborhood.

Corner: The two eigenvalues λ1, λ2 are large

Indirectly:

‘Corner’ = det(H) – k trace2(H)

Harris Corner Examples

SIFT descriptor

David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110

Example :

4*4 sub-regions

Histogram of 8 orientations in each

V = 128 values:

g1,1,…g1,8,… …g16,1,…g16,8

Constellation of Patches Using interest points

Fegurs, Perona, Zissermann 2003

2004 Carnegie Mellon University, all rights reserved.

A CAPTCHATM is a program that can generate and grade tests that most humans can pass, but current computer programs can't pass.

Classification: Class Examples

Recommended