View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Automatic Face Recognition for Film Character Retrieval in Feature-Length Films
Ognjen ArandjelovićAndrew Zisserman
The objective
Retrieve all shots in a video, e.g. a feature length film, containing a particular person
“Groundhog Day” [Ramis, 1993]
Visually defined search – on faces
Applications:• intelligent fast forward on characters• pull out all videos of “x” from 1000s of digital camera mpegs
Image variations due to:
• pose/scale
• lighting
• expression
• partial occlusion
The difficulty of face recognition
There’s been significant progress in face recognition in the recent years:
1. Pose/illumination invariant recognition (e.g. The 3D Morphable Model – [Blanz et al., 2002])
2. Local feature-based approaches (e.g. Elastic Bunch Graph Matching – [Bolme, 2003], Sivic et al., 2005)
3. Appearance manifold-based methods and online appearance model building (e.g. see previous talk)
4. Etc.
Previous work
Five key steps:
1. Feature localization
2. Affine warping
3. Face outline detection
4. Refine registration
5. Robust distance
System overview
Features Warp
Background Removal
Filter
SVM ClassifiersFeatures Training
DataProbabilistic Model of
Face Outline
Detected Face
Face SignatureImage
Normalized Pose
BackgroundClutter Removed
NormalizedIllumination
Facial feature detection
Train support vector machines to detect the eyes and the mouth (similar to “Names and Faces in the News” [Berg et al., 2004])
Independent Gaussian priors on feature locations
Example training data:
Learn invariance to:
• pose
• expression
Background removal
Key features and ideas:
• we do not use colour
• only gradient information is used
• faces are smooth with limited shape variability
• model boundary traversal as a Markov chain
Significant clutter in images of detected faces
Background removal
Radial mesh
Image intensity – threshold gradient to find interest points
Solved using dynamic programming
Registration refinement
• faces already affine registered using 3 facial features
• feature localization errors amount to a significant registration error
• refinement using appearance – normalized cross-correlation of salient regions
Salient regions
Face 1 Face 1 registered to 2Face 2
Occlusion detection
Key points:
• occlusion detected when a pair of images is compared
• from a training corpus learn the intra/intra-personal variance of each location/pixel
• occlusion = pixels with low intra/inter-personal probability
• contribution of occlusions to distance limited by Blake- Zisserman function
Two faces being compared
High occlusion probability
Grimace Hand
Evaluation - querying
The protocol:
1. faces are automatically detected
2. query consists of one or more faces of the reference actor
and, optionally
3. images of non-reference actors
Evaluation - distances
Three matching methods:
• K-min distance
• Linear subspace (reference only)
• Nearest linear subspace (reference and other)
QueryCorrect person
Other
Evaluation - performance
Performance measure:
• operates on sequences of recalled images
• rank-ordering score S
> in the range [0,1]
> = 1 indicates all N true positives are recalled first
> = 0.5 indicates a random ordering
Google-like retrieval, faces are ordered in decreasing similarity
Method evaluated on several films:
• Groundhog Day
• Pretty Woman
• Run, Lola Run
• Fawlty Towers
Results - data
Typical input data