Upload
jade-oneal
View
220
Download
3
Embed Size (px)
Citation preview
Adding Domain-Specific Knowledge
Amit Singhal & Jiebo Luo
Research LaboratoriesEastman Kodak Company
FUSION 2001, Montreal
August 7-10, 2001
Outline of Talk
Problem Statement Background
Relevant Prior Art Evidence Fusion Framework Automatic Main Subject Detection System
Injecting Orientation Information Feature detectors
Conclusions Future Work
Main Subject Detection
What Is the Main Subject in A Picture? 1st-party truth (the photographer): in general not available due to
the specific knowledge the photographer may have about the setting
3rd-party truth: in general there is good agreement among 3rd-party observers if the photographer successfully used the picture to communicate his interest in the main subject to the viewers
Related Prior Art
Main subject (region-of-interest) detection Milanese (1993) : Uses biologically motivated models for identifying regions of
interest in simple pictures containing highly contrasting foreground and background.
Marichal (et al.) (1996), Zhao (et al.) (1996) : Use a subjective fuzzy modeling approach to describe semantic interest in video sequences (primarily video-conferencing).
Syeda-Mahmood (1998) : Uses a color-based approach to isolate regions in an image likely to belong to the same object. Main application is reduction of search space for object recognition
Evidence Fusion Pearl (1988) : Provides a theory and evidence propagation scheme for Bayesian
networks. Rimey & Brown (1994) : Use Bayesian networks for control of selective
perception in a structured spatial scene. Buxton (et al.) (1998) : Use a set of Bayesian networks to integrate sensor
information to infer behaviors in a traffic monitoring application.
The Evidence Fusion Framework
Region based representation scheme.
Virtual belief sensors map output of physical sensors and algorithmic feature detectors to probabilistic space.
Domain knowledge used to generate network structure.
Expert knowledge and ground truth-based training methodologies to generate the priors and the conditional probability matrices.
Bayesian network combines evidence generated by the sensors and feature detectors using a very fast message passing scheme.
Bayesian Networks
A directed acyclic graph Each node represents an entity
(random variable) in the domain Each link represents a causality
relationship and connects two nodes in the network
The direction of the link represents the direction of causality
Each link encodes the conditional probability between the parent and child nodes
Evaluation of the Bayes network is equivalent to knowing the joint probability distribution
Automatic Main Subject Detection System
An Interesting Research Problem Conventional wisdom (or how a human performs such a task)
Object Segmentation -> Object Recognition -> Main Subject Determination Object recognition is an unconstrained problem in consumer photographs
Inherent Ambiguity 3rd party probabilistic ground truth Large number of camera sensors and feature detectors
Speed and performance scalability concerns
Of extreme industrial interest to digital photofinishing Allows for automatic image enhancements to produce better
photographic prints Other applications such as
Image compression, storage, and transmission Automatic image recompositing Object-based image indexing and retrieval
Overview
Methodology Produce a belief map of regions in the scene being part of the main subject Utilize a region-based representation of the image derived from image
segmentation and perceptual grouping Utilize semantic features (human flesh and face, sky, grass) and general
saliency features (color, texture, shape and geometric features) Utilize a Bayes Net-based architecture for knowledge representation and
evidence inference Dealing with Intrinsic Ambiguity
Ground truth is “probabilistic” not “deterministic” Limitations in our understanding of the problem
Dealing with “Weak” Vision Features Reality of the state-of-the-art of computer vision Limited accuracy of the current feature extraction algorithms
Injecting Metadata into the System
Sources of metadata Camera : Flash fired, Subject distance, Orientation etc. IU Algorithms : Indoor/Outdoor, Scene type, Orientation etc. User annotation
The Bayesian network is very flexible and can be quickly adapted to take advantage of available metadata
Metadata enabled knowledge can be injected into the system using Metadata-aware feature detectors Metadata-enhanced Bayesian networks
Orientation
Main difference between orientation-aware and orientation non-aware systems is in the location features
Orientation Aware Bayesian Network
Use orientation aware centrality and borderness features Other feature detectors affected by orientation but not retrained:
sky, grass Not retrained if BN is used for main subject detection as the location features would
account for the orientation information Using orientation information to compute the sky and grass evidence would lead to
better performance for a sky or grass detection system.
Retrain the links in the Bayesian network for each feature affected by orientation information BorderA-Borderness BorderD-Borderness Borderness-Location Centrality-Location Location-MainSubject
Conclusions and Future Work
Bayesian networks offer the flexibility of easily incorporating domain specific knowledge such as orientation information into the system
This knowledge can be added by : modifying the feature detectors using new feature detectors changing the structure of the Bayesian network retraining the conditional probability matrices associated with the
Bayesian network
Directions for Future Work Use of additional metadata such as indoor/outdoor, urban/rural,
day/night Single super BN versus a library of metadata-aware BNs?