Adding Domain-Specific Knowledge Amit Singhal & Jiebo Luo Research Laboratories Eastman Kodak Company FUSION 2001, Montreal August 7-10, 2001

Adding Domain-Specific Knowledge

Amit Singhal & Jiebo Luo

Research LaboratoriesEastman Kodak Company

FUSION 2001, Montreal

August 7-10, 2001

Outline of Talk

Problem Statement Background

Relevant Prior Art Evidence Fusion Framework Automatic Main Subject Detection System

Injecting Orientation Information Feature detectors

Conclusions Future Work

Main Subject Detection

What Is the Main Subject in A Picture? 1st-party truth (the photographer): in general not available due to

the specific knowledge the photographer may have about the setting

3rd-party truth: in general there is good agreement among 3rd-party observers if the photographer successfully used the picture to communicate his interest in the main subject to the viewers

Related Prior Art

Main subject (region-of-interest) detection Milanese (1993) : Uses biologically motivated models for identifying regions of

interest in simple pictures containing highly contrasting foreground and background.

Marichal (et al.) (1996), Zhao (et al.) (1996) : Use a subjective fuzzy modeling approach to describe semantic interest in video sequences (primarily video-conferencing).

Syeda-Mahmood (1998) : Uses a color-based approach to isolate regions in an image likely to belong to the same object. Main application is reduction of search space for object recognition

Evidence Fusion Pearl (1988) : Provides a theory and evidence propagation scheme for Bayesian

networks. Rimey & Brown (1994) : Use Bayesian networks for control of selective

perception in a structured spatial scene. Buxton (et al.) (1998) : Use a set of Bayesian networks to integrate sensor

information to infer behaviors in a traffic monitoring application.

The Evidence Fusion Framework

Region based representation scheme.

Virtual belief sensors map output of physical sensors and algorithmic feature detectors to probabilistic space.

Domain knowledge used to generate network structure.

Expert knowledge and ground truth-based training methodologies to generate the priors and the conditional probability matrices.

Bayesian network combines evidence generated by the sensors and feature detectors using a very fast message passing scheme.

Bayesian Networks

A directed acyclic graph Each node represents an entity

(random variable) in the domain Each link represents a causality

relationship and connects two nodes in the network

The direction of the link represents the direction of causality

Each link encodes the conditional probability between the parent and child nodes

Evaluation of the Bayes network is equivalent to knowing the joint probability distribution

Automatic Main Subject Detection System

An Interesting Research Problem Conventional wisdom (or how a human performs such a task)

Object Segmentation -> Object Recognition -> Main Subject Determination Object recognition is an unconstrained problem in consumer photographs

Inherent Ambiguity 3rd party probabilistic ground truth Large number of camera sensors and feature detectors

Speed and performance scalability concerns

Of extreme industrial interest to digital photofinishing Allows for automatic image enhancements to produce better

photographic prints Other applications such as

Image compression, storage, and transmission Automatic image recompositing Object-based image indexing and retrieval

Overview

Methodology Produce a belief map of regions in the scene being part of the main subject Utilize a region-based representation of the image derived from image

segmentation and perceptual grouping Utilize semantic features (human flesh and face, sky, grass) and general

saliency features (color, texture, shape and geometric features) Utilize a Bayes Net-based architecture for knowledge representation and

evidence inference Dealing with Intrinsic Ambiguity

Ground truth is “probabilistic” not “deterministic” Limitations in our understanding of the problem

Dealing with “Weak” Vision Features Reality of the state-of-the-art of computer vision Limited accuracy of the current feature extraction algorithms

The Multilevel Bayesian Network

Injecting Metadata into the System

Sources of metadata Camera : Flash fired, Subject distance, Orientation etc. IU Algorithms : Indoor/Outdoor, Scene type, Orientation etc. User annotation

The Bayesian network is very flexible and can be quickly adapted to take advantage of available metadata

Metadata enabled knowledge can be injected into the system using Metadata-aware feature detectors Metadata-enhanced Bayesian networks

Orientation

Main difference between orientation-aware and orientation non-aware systems is in the location features

The Centrality Feature Detector

Orientation Unaware Centrality Feature Detector

Orientation Aware Centrality Feature Detector

Borderness Feature

Orientation Unaware a=b=c=d=e

Orientation Aware a < b < c < d < e

Orientation Aware Bayesian Network

Use orientation aware centrality and borderness features Other feature detectors affected by orientation but not retrained:

sky, grass Not retrained if BN is used for main subject detection as the location features would

account for the orientation information Using orientation information to compute the sky and grass evidence would lead to

better performance for a sky or grass detection system.

Retrain the links in the Bayesian network for each feature affected by orientation information BorderA-Borderness BorderD-Borderness Borderness-Location Centrality-Location Location-MainSubject

Experimental Results



dKS Plot of Orientation-Aware MSD

Conclusions and Future Work

Bayesian networks offer the flexibility of easily incorporating domain specific knowledge such as orientation information into the system

This knowledge can be added by : modifying the feature detectors using new feature detectors changing the structure of the Bayesian network retraining the conditional probability matrices associated with the

Bayesian network

Directions for Future Work Use of additional metadata such as indoor/outdoor, urban/rural,

day/night Single super BN versus a library of metadata-aware BNs?

Documents

Adding Domain-Specific Knowledge Amit Singhal & Jiebo Luo Research Laboratories Eastman Kodak Company FUSION 2001, Montreal August 7-10, 2001