Upload
ritu-khare
View
24
Download
1
Tags:
Embed Size (px)
Citation preview
www.ischool.drexel.edu
iBioSearch: The Integrated Biological Database Search
Ritu Khare and Yuan An
METHODOLOGY
1. Web Interface (Wis) Collection: Collect WIs to biological databases.
2. Information Extraction: For each WI, extract attributes corresponding to
the WI metamodel. Broadly, a WI can be represented as a collection of
search entities and their respective labels (search criteria).
3. Mapping WI- metamodel: Map each WI to the WI metamodel to generate
the instances of the metamodel. Then, we have a list of search entities and
their respective criteria (labels). For a given search entity Si , there will be
label set (li1, li2, li3,…, lim).
4. Clustering: Find non-overlapping classes of search entities representing
synonyms, and for each class, find a list of non-redundant labels.
5. Generation of GBWS: Eventually, we generate another conceptual model
that we call as a “Global Biological WI Schema“ (GBWS). It would represent
all possible input WIs in a non-redundant manner, and capture matchings
between individual instances of the WI metamodel.
CURRENT AND PREDICTED RESULTS The GBWS or ontology could be represented as a meta-search
interface for biologists wherein they can search for most of the
biological entities on several search criteria available on
different databases.
Eventually, we aim to find the answers to other research
questions such as:
1. Differences between commercial and biological databases.
2. Automatic identification of biological search interfaces.
3. Reverse Engineering of a WI into an ER diagram.
4. Integration of multiple ER diagrams
5. Extracting relationships between biological search entities.
FUTURE WORK In future, we intend to dynamically update biological databases
repository, maintain semantic mappings when base
databases evolve, translate user queries, and consolidate,
reconcile, and rank the query results using data cleansing and
relevance computing algorithms. In addition to this, our plan
includes performing usability testing of iBioSearch system with
the help of biologists.
REFERENCES 1. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from
web pages. Proceedings of the 2003 ACM SIGMOD International
Conference on Management of Data , San Diego, California. 337-348.
2. Barbosa, L., Tandon, S., & Freire, J. (2007). Automatically constructing
a directory of molecular biology databases. Proceedings of the
International Workshop on Data Integration in the Life Sciences 2007
(DILS), Philadelphia, PA.
3. He, B., & Chang, K. C. (2003). Statistical schema matching across web
query interfaces. 2003 ACM SIGMOD International Conference on
Management of Data , San Diego, Californi. 217-228.
4. Wang, J., Wen, J., Lochovsky, F., & Ma, W. (2004). Instance-based
schema matching for web databases by domain-specific query probing.
Thirtieth International Conference on very Large Data Bases, 30, 408 -
419.
Fig. 1: Problem - biologist
searching for an entity
Fig. 3: Methodology
Fig.2: WI Metamodel
PROBLEM Presence, of a very large number of biological Web databases and
their interfaces, makes it difficult for biologists to search for any
biological entity (See Fig. 1). Currently, the only option biologists
have is to search each of these numerous interfaces individually.
OUR SOLUTION We aim to provide a unified search interface with capability of
searching multiple (1000+) biological databases. This interface
would be a representation of the biological search interface
ontology. For finding the global search ontology, we take a novel
approach of reverse engineering individual search interface into a
conceptual model, and then finding an integrated model that would
be consistent with all the interfaces up to a level of significance.
HYPOTHESIS & ASSUMPTIONS
WI Metamodel: We observe that all input Web Interfaces (WIs) have an
underlying global model. We created this global model manually and termed
it as the "WI Metamodel". See Fig. 2.
WI: Every Web Interface (WI) can be represented as an instance of the
metamodel.
Which interface to search?
Which database to access?
What all search criteria do I have?
How many sources to consider?
OLDB OLDBOLDBOLDB OLDB
INFORMATION
EXTRACTION
MAPPING WI
WITH
METAMODEL
RE
VE
RS
E E
NG
INE
ER
ING
INF
OR
MA
TIO
N
RE
TR
IEV
AL
META-SEARCH
INTERFACE
WI MetaModel
CLUSTERING
SEARCH ENTITIES
AND LABELS
GENERATION OF
GLOBAL
BIOLOGICAL WI
SCHEMA