Upload
stacey-freeby
View
225
Download
6
Tags:
Embed Size (px)
Citation preview
A Distributed Multimedia Data Management over the Grid
Kasturi Chatterjee
Distributed Multimedia Information System Laboratory
School of Computing and Information Sciences
Florida International University, Miami, FL 33199, USA
Global Cyberbridges 2008 Proposal
2
Outline• Motivation
– Why multimedia data ?– Why handling and representing multimedia data challenging?– Why distributed environment ?– Why content based image/video retrieval ?
• Multimedia data management– Representation– Storage and Indexing– Popular retrieval strategies
• Proposed Work Outline– Issues to be addressed– Components and Related Work
• Conclusion
Global Cyberbridges 2008 Proposal
3
MotivationWhy multimedia data ?
– Attractive – Informative– Compact– Cheap memory makes storage easy
Why handling and representing multimedia data challenging?– Huge size (a typical 10 sec MPEG video ~4M)– Temporal and Spatial Information– High-level meaning and the semantic gap– Multidimensional representation– Traditional database incapable of accommodating above
characteristics
Global Cyberbridges 2008 Proposal
4
Motivation
Why distributed environment ?– Share storage – Share computing power– No single point of failure
Why content based image/video retrieval ?– unlike traditional data, temporal, spatial and semantic content should be considered during
query of multimedia data Can queries be issued textually for image/video databases? MAY BE NOT!
– Meta data– Keywords
• In Google Images: sunset Query By Example, Similarity Measurement, Content Interpretation, User Feedback etc. to be considered
Global Cyberbridges 2008 Proposal
5
Multimedia data management
Representation – Multidimensional : Unlike traditional data which is uni-
dimensional, multimedia data in the form of image or video is multidimensional.
– Semantic Interpretation : Multimedia data can have varied semantic interpretation.
– Feature Selection : Identifying feature space to represent the multimedia data is an important and crucial step in MDBMS. Features can be Color, Texture or Temporal information etc.
The atypical nature of multimedia data needs special representation in the form of multidimensional feature vectors
Global Cyberbridges 2008 Proposal
6
Multimedia data management
Storage and Indexing– Indexing is an integral part of designing a database system to reduce computation overhead and optimize retrieval.
Multimedia Data Indexing Requirements• Multimedia data stored as multidimensional feature vector.• Need to index a high dimensional feature space.• Index structure should map low level representation and high level
semantic relationship.• Index structure should handle popular multimedia data retrieval
strategies like content-based image retrieval (CBIR), relevance feedback (RF), video event retrievals etc.
Existing multidimensional indexing strategies fail to fulfill the aboverequirements efficiently!
Global Cyberbridges 2008 Proposal
7
Multimedia data management
• Popular Retrieval Strategies (Content-Based Image/Video Retrieval)
Image Database
Image Descriptor Space
Feature Descriptor Extraction
SimilarityMeasurement
Retrieval Results
Global Cyberbridges 2008 Proposal
8
Proposed Work OutlineA typical Grid Architecture
Source: http://gridcafe.web.cern.ch/gridcafe/gridatwork/architecture.html
Global Cyberbridges 2008 Proposal
9
Proposed Work Outline
Research Issues– Development of a technique to enable uniform representation of the
multimedia data
– Development of an efficient index structure, capable of handling multimedia data and support applications like CBIR/CBVR, spanning across multiple storages over a Grid/distributed environment
– Devising a mechanism by which users’ similarity concept across multiple network domains can be considered during providing query results
In short we envision to develop a distributed multimedia storage and
management system which will be capable of supporting popular retrieval
applications like CBIR/CBVR
Global Cyberbridges 2008 Proposal
10
Proposed Work Outline
The development and design of a multimedia data
management over grid has two critical components:
– Proper storage which prompts the requirement of a distributed multidimensional index structure and development of distributed retrieval algorithms (distributed k-NN or Range) supported by the index structure
– Efficient retrieval which prompts the introduction of techniques to map low level features with high level semantic concepts, over a distributed environment, to provide relevant query results
Global Cyberbridges 2008 Proposal
11
Proposed Work Outline
Concepts to be utilized and Related Works
– We have developed an index structure, called Affinity Hybrid Tree [1], for single node or stand alone applications, which is capable of indexing multidimensional images/videos and support CBIR/CBVR
• Plan to extend it as the basic indexing and storage framework since it proved itself very efficient in stand alone environments
– To capture the high level similarity concepts among the users in a distributed environment, we will develop a novel architecture called Distributed Affinity Capture Model (DACM) based on hierarchical markov model mediator [2].
Global Cyberbridges 2008 Proposal
12
Proposed Work OutlineComponents
• Affinity Hybrid TreeFeature based index mechanism filters the feature space and reduce the # of distance
computations to be performed
Distance based index mechanism incorporates the high-level image relationship as it is without translating it into its low-level equivalence
Reduce computational overhead
Increase retrieved image relevanceby capturing the user concept as it is
Global Cyberbridges 2008 Proposal
13
Proposed Work Outline Components
• Building AH-TreeFeature spacefiltering
Semanticrelationship introduction
Feature Vectors
feed
root
Space Index
Indexed subspace
Indexed subspace
Distance based indexing
Distance based indexing
Indexed data
Indexed data
Indexed data
Indexed data
Global Cyberbridges 2008 Proposal
14
Proposed Work Outline Components
Sample Results
• Computation CostFeature-space filtering reduces # of image
objects to be examined. Hence, reduces
# of distance computations manifold.
Accuracy:
AH-Tree – 80%
M-Tree – 10-20%
Global Cyberbridges 2008 Proposal
15
Proposed Work Outline Components
Hierarchical Markov Model Mediator (HMMM) [2]– A HMMM is represented by an 8-tuple
Where, d # levels in HMMM S multimedia objects in different levels F distinctive features or semantic concepts (depending upon the level) A Affinity Relationship between multimedia objects B Features/Concepts at each level Initial state probability distribution O Weights of importance for the lower level features and higher level concepts L Link condition between higher level and lower level states The model has been used successfully for several applications like CBIR and web document clustering
),,,,,,,( LOBAFSd
Global Cyberbridges 2008 Proposal
16
Tentative Road Map
• Details Literature Review for the following concepts:
– available data management tools and techniques in Grid computing
– peer-to-peer file sharing systems
• Development of the following algorithms and models– devise distributed k-NN search supporting CBIR/CBVR from
within an index structure – develop Distributed Affinity Capture Model (DACM) to capture
users’ concept of high-level similarity
• Implementation of the entire system
Global Cyberbridges 2008 Proposal
17
Conclusion
We propose to develop– An efficient multimedia data management framework over a
distributed environment like Grid– Develop distributed content-based retrieval algorithms which will
span across the grid to provide • semantically close query results
• quickly and efficiently
– Devise a way to capture users’ concept of similarity across the grid (bridging the gap between low-level features and high-level semantics is a challenge) with
• An architecture called Distributed Affinity Capture Model (DACM)
Global Cyberbridges 2008 Proposal
18
Questions
Global Cyberbridges 2008 Proposal
19
Selected References
[1] Kasturi Chatterjee and Shu-Ching Chen, "A Novel Indexing and Access Mechanism using Affinity Hybrid Tree for Content-Based Image Retrieval in Multimedia Databases," International Journal of Semantic Computing (IJSC), Vol. 1, Issue 2, pp. 147-170, June 2007.
[2] Mei-Ling Shyu, Shu-Ching Chen, Min Chen, Chengcui Zhang, and Chi-Min Shu, "MMM: A Stochastic Mechanism for Image Database Queries," Proceedings of the IEEE Fifth International Symposium on Multimedia Software Engineering (MSE2003), pp. 188-195, December 10-12, 2003, Taichung, Taiwan, ROC.
[3] M.-L. Shyu, S.-C. Chen, and C. Haruechaiyasak, C.-M. Shu, and S.-T. Li, “Disjoint Web Document Clustering and Management in Electronic Commerce,” the Seventh International Conference on Distributed Multimedia Systems (DMS’2001), pp. 494-497, 2001.[4] Mei-Ling Shyu, Shu-Ching Chen, Min Chen, Chengcui Zhang, Kanoksri Sarinnapakorn, "Image Database Retrieval Utilizing Affinity Relationships," accepted for publication, the First ACM International Workshop on Multimedia Databases (ACM MMDB'03), November 7, 2003, New Orleans, Louisiana, USA.