21
Automatic Face Recognition for Film Character Retrieval in Feature- Length Films Ognjen Arandjelović Andrew Zisserman

Automatic Face Recognition for Film Character Retrieval in Feature-Length Films Ognjen Arandjelović Andrew Zisserman

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Automatic Face Recognition for Film Character Retrieval in Feature-Length Films

Ognjen ArandjelovićAndrew Zisserman

The objective

Retrieve all shots in a video, e.g. a feature length film, containing a particular person

“Groundhog Day” [Ramis, 1993]

Visually defined search – on faces

Applications:• intelligent fast forward on characters• pull out all videos of “x” from 1000s of digital camera mpegs

Image variations due to:

• pose/scale

• lighting

• expression

• partial occlusion

The difficulty of face recognition

There’s been significant progress in face recognition in the recent years:

1. Pose/illumination invariant recognition (e.g. The 3D Morphable Model – [Blanz et al., 2002])

2. Local feature-based approaches (e.g. Elastic Bunch Graph Matching – [Bolme, 2003], Sivic et al., 2005)

3. Appearance manifold-based methods and online appearance model building (e.g. see previous talk)

4. Etc.

Previous work

Five key steps:

1. Feature localization

2. Affine warping

3. Face outline detection

4. Refine registration

5. Robust distance

System overview

Features Warp

Background Removal

Filter

SVM ClassifiersFeatures Training

DataProbabilistic Model of

Face Outline

Detected Face

Face SignatureImage

Normalized Pose

BackgroundClutter Removed

NormalizedIllumination

Facial feature detection

Train support vector machines to detect the eyes and the mouth (similar to “Names and Faces in the News” [Berg et al., 2004])

Independent Gaussian priors on feature locations

Example training data:

Learn invariance to:

• pose

• expression

Detected eyes and mouths

Successful detections in spite of large pose and expression variation

Warped faces using detected features

Original detected faces

Faces after affine warping

Background removal

Key features and ideas:

• we do not use colour

• only gradient information is used

• faces are smooth with limited shape variability

• model boundary traversal as a Markov chain

Significant clutter in images of detected faces

Background removal

Radial mesh

Image intensity – threshold gradient to find interest points

Solved using dynamic programming

Background removal – examples

Registered Segmented

Registration refinement

• faces already affine registered using 3 facial features

• feature localization errors amount to a significant registration error

• refinement using appearance – normalized cross-correlation of salient regions

Salient regions

Face 1 Face 1 registered to 2Face 2

Occlusion detection

Key points:

• occlusion detected when a pair of images is compared

• from a training corpus learn the intra/intra-personal variance of each location/pixel

• occlusion = pixels with low intra/inter-personal probability

• contribution of occlusions to distance limited by Blake- Zisserman function

Two faces being compared

High occlusion probability

Grimace Hand

Evaluation - querying

The protocol:

1. faces are automatically detected

2. query consists of one or more faces of the reference actor

and, optionally

3. images of non-reference actors

Evaluation - distances

Three matching methods:

• K-min distance

• Linear subspace (reference only)

• Nearest linear subspace (reference and other)

QueryCorrect person

Other

Evaluation - performance

Performance measure:

• operates on sequences of recalled images

• rank-ordering score S

> in the range [0,1]

> = 1 indicates all N true positives are recalled first

> = 0.5 indicates a random ordering

Google-like retrieval, faces are ordered in decreasing similarity

Method evaluated on several films:

• Groundhog Day

• Pretty Woman

• Run, Lola Run

• Fawlty Towers

Results - data

Typical input data

Results – rank ordering score

Rank ordering score for 35 retrievals of Basil and Sybil

Basil Sybil

Fawlty Towers

(John Cleese)

Results – example recalls

Pretty Woman

(Julia Roberts)

Results – example recalls

Groundhog day

(Bill Murray)

Groundhog day

(Andie MacDowel)

Conclusions

Future work:

• Use of sequence information for disambiguation in recognition (see “Person spotting: video shot retrieval for face sets” [Sivic et al., CIVR 2005])

• Use of photometric models for improved illumination normalization