Page 1
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
MDBS Schema Integration: The MDBS Schema Integration: The Relational Integration ModelRelational Integration Model
MDBS Schema Integration: The MDBS Schema Integration: The Relational Integration ModelRelational Integration Model
Candidacy Exam Presentation for:Candidacy Exam Presentation for:Ramon LawrenceRamon Lawrence
University of ManitobaUniversity of [email protected]@cs.umanitoba.ca
Candidacy Exam Presentation for:Candidacy Exam Presentation for:Ramon LawrenceRamon Lawrence
University of ManitobaUniversity of [email protected]@cs.umanitoba.ca
Page 2
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
Outline
Introduction The MDBS architecture and the Integration
problem A schema integration taxonomy Previous Work The RIM Architecture and the RIM Model Future work and conclusions
Page 3
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
Database Terminology
database system - a database and a system to manage the data
transaction - an atomic sequence of operations applied to the database
global transaction - a transaction spanning more than one database
schema integration - the process of combining local schemas into a global, integrated schema
multidatabase system (MDBS) - a collection of autonomous, local databases participating in a global database system to share data
Page 4
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
LDBS
GTS
MDBS Architecture
Global Transaction Manager (GTM)
•processes global transactions•insures information in all LDBSs is consistent•submits subtransactions to the GTSs for each LDBS
Global Transaction Servers (GTSs)•one for each LDBS•converts subtransactions from the GTM into a form usable by the LDBS and vice versa
Local Database Systems (LDBSs)•databases combined into MDBS•unchanged as still process local transactions
GTM
Global Transactions
Local Transactions
subtransactions
GTSGTS GTS
LDBS LDBS LDBS
Page 5
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
The Integration Problem
Integrating diverse data sources is an important issue as organizations interconnect their operations and demand more from their database systems
Integration is a hard problem because structural and semantic conflicts exist
Two levels of integration: schema integration data integration
Page 6
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
Schema Integration
Schema integration is the process of combining database schemas into a coherent global view
Integration problems include: different data models incompatible concept representations different user or view perspectives structural conflicts within a model naming conflicts (homonym, synonym)
Page 7
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
A Schema Integration Taxonomy
Automation Level
ConflictsResolved
Transparency
NONE
manual
semi-automatic
automatic (static)
automatic (dynamic)
naming
structural
interschema
semantic
all
structural
behavioral
both
Page 8
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
Previous Work
semantic models: Batini (86), canonical models, SDM, DAPLEX
schema re-engineering: model mapping tools, schema transformations
metadata systems: rule-based systems
object-oriented methods: use as a canonical model, schema transformations
application-level integration: language systems, MSQL, IDL, higher order views
Page 9
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
Previous Work (cont.)
Interdatabase dependencies: Sheth - relaxed consistency, integration rules
AI techniques: Pegasus (spheres of knowledge), knowledge
packets, Carnot project (Cyc knowledge base)
Lexical semantics: Summary Schemas Model (Bright et al.) - user
interface for imprecise queries
Industrial systems: Interbase
Page 10
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: Objective
The objective of the RIM model is to provide a system for automatically integrating diverse relational schemas into a multidatabase
Desirable properties: individual mappings - information sources integrated
one-at-a-time and independently global view constructed for query transparency handles schema conflicts - including semantic,
structural, and naming conflicts automated global integration - global view
constructed efficiently and automatically
Page 11
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: The Idea
The idea behind the RIM model is that most (and probably all) schema conflicts can be resolved if we:
eliminate all naming conflicts define a language capable of determining schema
equivalence and performing transformations
With these two properties, schema conflicts can be resolved automatically at the global level
Page 12
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: The Plan
The first task is eliminating naming conflicts: use a global thesaurus/dictionary like SSM map local schema names into global counterparts identical concepts can be identified by global name
The integration language must be defined: RIM specifications - records capturing semantics of
each LDBS in a machine-processable form global names captured in RIM specs. to identify
concepts stored in LDBS
Page 13
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: The Plan (cont.)
Integrate RIM specifications: To query the MDBS, the client downloads and
integrates only RIM specs. of LDBSs accessed Global view is constructed from RIM specs. by
automatically combining them at client site using global names and semantic metadata they contain
Use of global names allows system to determine identical concepts even though structural representations may be different
Semantic information captured using metadata
Page 14
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: The Plan (cont.)
Querying the MDBS: queries are posed to the MDBS through the global
view at each client translation from the GV back to the original RIM
spec. for each LDBS is performed the translated queries are sent to each LDBS which
transforms the query (specified using RIM) into a query for the LDBS
results are returned to the client which integrates them based on its GV
Page 15
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: Architecture
RDBS RDBS RDBS RDBS
RIM spec.
RIM spec.
RIM spec.
RIM spec.
RIM Integration RIM Integration
Client Client
Global View Global View
RIM Specifications:• constructed at each RDBS• local concepts mapped to global names• schema can be automatically extracted
RIM Integration:• uses needed RIM specs.• constructs global view• resolves conflicts by:
• identifying concepts using global names• transforming concepts into a form consistent with the global view
Page 16
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: Using Global Names
Global names attempt to capture semantics of data and its structure
Research has found that a single dictionary term is insufficient to capture all semantics of a given data item
Current proposed global name term: [context term] [concept name] ([adjective phrases]) [adjective phrase] = [adjective] [preposition]
([context term] or [concept name])
Page 17
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: Using Global Names (cont.)
Here a few examples of using global names: the database stores damage claim information
Example 1: attribute of claim is called net_amount in system GN: [Claim] Net Amount
Example 2: attribute of claim is called claim_date in system GN1: [Claim] Claim date (received by system) GN2: [Claim] Claim date (received by company) GN3: [Claim] Claim date (submitted by claimant)
Page 18
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: The Global Dictionary
To match concepts across systems, a global dictionary is required. Global names are taken from this dictionary.
Currently developing a simplified on-line dictionary:
stores hierarchy (IS-A) relationships and component (Part-of) relationships
global terms for RIM are taken from the dictionary dictionary will allow user-defined words
Future work involves determining how to add locally defined terms into the dictionary if required
Page 19
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: Basic Concepts
There are 3 basic modeling constructs in RIM: entity - a concept whose existence does not depend on
any other entities relationship - a combination of two or more entities
which does not exists without them attribute - a characteristic of an entity or a relationship
All entities and attributes should be identifiable by a global name from the dictionary.
Page 20
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: RIM Specifications
A RIM specification consists of two parts: table headers - table-level information for each relation
in database table schemas - information at the attribute level of a
database relation
Most of the information can be automatically extracted, however the DBA must assign global names to local concepts manually
Page 21
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: The Table Header
The table header provides table-level information for each relation and has fields:
name - unique table name (local) record size and count foreign key list and foreign key access list record insert/delete/update mechanisms record name - semantic name for a table record record type - entity, relationship instance, ... record grouping - why are records in the table? record distinction/duplicates - primary key table comment
Page 22
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: The Table Schema
The table schema contains attribute-level information. Some fields include:
field name - database system name semantic name - global name field use:
attribute, key, categorization, summation, date/time, foreign key, logical, numeric, reference
Page 23
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
RIM: Semantic Conflicts
There are 6 basic semantic conflicts in RIM: attribute-entity conflict attribute-relationship conflict entity-relationship conflict entity-entity conflict (not studied) attribute-attribute conflict (not studied) relationship-relationship conflict (not studied)
There is some basic ideas on how to automatically resolve the first 3 conflicts.
Conflict resolution is an area of future work.
Page 24
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
Conclusions
Current integration methodologies are insufficient because they rely on manual intervention and do not resolve all types of conflicts
The RIM model may be able to integrate diverse relational schemas using a global dictionary, a systematic method for capturing data semantics, and automated procedures for performing client run-time integration
Page 25
MDBS Schema Integration: The Relational Integration ModelRamon Lawrence
Future Work
Determining how the RIM specifications can be constructed and what information can be automatically extracted
Deciding the format for the global dictionary Studying conflict resolution procedures and
testing methodology on simple integration problems