25
Page 1 MDBS Schema Integration: The Relational Integration Model Ramon Lawrence MDBS Schema Integration: The MDBS Schema Integration: The Relational Integration Model Relational Integration Model Candidacy Exam Presentation for: Candidacy Exam Presentation for: Ramon Lawrence Ramon Lawrence University of Manitoba University of Manitoba [email protected] [email protected]

MDBS Schema Integration: The Relational Integration Model

  • Upload
    beck

  • View
    54

  • Download
    0

Embed Size (px)

DESCRIPTION

MDBS Schema Integration: The Relational Integration Model. Candidacy Exam Presentation for: Ramon Lawrence University of Manitoba [email protected]. Outline. Introduction The MDBS architecture and the Integration problem A schema integration taxonomy Previous Work - PowerPoint PPT Presentation

Citation preview

Page 1: MDBS Schema Integration: The Relational Integration Model

Page 1

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

MDBS Schema Integration: The MDBS Schema Integration: The Relational Integration ModelRelational Integration Model

MDBS Schema Integration: The MDBS Schema Integration: The Relational Integration ModelRelational Integration Model

Candidacy Exam Presentation for:Candidacy Exam Presentation for:Ramon LawrenceRamon Lawrence

University of ManitobaUniversity of [email protected]@cs.umanitoba.ca

Candidacy Exam Presentation for:Candidacy Exam Presentation for:Ramon LawrenceRamon Lawrence

University of ManitobaUniversity of [email protected]@cs.umanitoba.ca

Page 2: MDBS Schema Integration: The Relational Integration Model

Page 2

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

Outline

Introduction The MDBS architecture and the Integration

problem A schema integration taxonomy Previous Work The RIM Architecture and the RIM Model Future work and conclusions

Page 3: MDBS Schema Integration: The Relational Integration Model

Page 3

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

Database Terminology

database system - a database and a system to manage the data

transaction - an atomic sequence of operations applied to the database

global transaction - a transaction spanning more than one database

schema integration - the process of combining local schemas into a global, integrated schema

multidatabase system (MDBS) - a collection of autonomous, local databases participating in a global database system to share data

Page 4: MDBS Schema Integration: The Relational Integration Model

Page 4

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

LDBS

GTS

MDBS Architecture

Global Transaction Manager (GTM)

•processes global transactions•insures information in all LDBSs is consistent•submits subtransactions to the GTSs for each LDBS

Global Transaction Servers (GTSs)•one for each LDBS•converts subtransactions from the GTM into a form usable by the LDBS and vice versa

Local Database Systems (LDBSs)•databases combined into MDBS•unchanged as still process local transactions

GTM

Global Transactions

Local Transactions

subtransactions

GTSGTS GTS

LDBS LDBS LDBS

Page 5: MDBS Schema Integration: The Relational Integration Model

Page 5

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

The Integration Problem

Integrating diverse data sources is an important issue as organizations interconnect their operations and demand more from their database systems

Integration is a hard problem because structural and semantic conflicts exist

Two levels of integration: schema integration data integration

Page 6: MDBS Schema Integration: The Relational Integration Model

Page 6

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

Schema Integration

Schema integration is the process of combining database schemas into a coherent global view

Integration problems include: different data models incompatible concept representations different user or view perspectives structural conflicts within a model naming conflicts (homonym, synonym)

Page 7: MDBS Schema Integration: The Relational Integration Model

Page 7

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

A Schema Integration Taxonomy

Automation Level

ConflictsResolved

Transparency

NONE

manual

semi-automatic

automatic (static)

automatic (dynamic)

naming

structural

interschema

semantic

all

structural

behavioral

both

Page 8: MDBS Schema Integration: The Relational Integration Model

Page 8

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

Previous Work

semantic models: Batini (86), canonical models, SDM, DAPLEX

schema re-engineering: model mapping tools, schema transformations

metadata systems: rule-based systems

object-oriented methods: use as a canonical model, schema transformations

application-level integration: language systems, MSQL, IDL, higher order views

Page 9: MDBS Schema Integration: The Relational Integration Model

Page 9

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

Previous Work (cont.)

Interdatabase dependencies: Sheth - relaxed consistency, integration rules

AI techniques: Pegasus (spheres of knowledge), knowledge

packets, Carnot project (Cyc knowledge base)

Lexical semantics: Summary Schemas Model (Bright et al.) - user

interface for imprecise queries

Industrial systems: Interbase

Page 10: MDBS Schema Integration: The Relational Integration Model

Page 10

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: Objective

The objective of the RIM model is to provide a system for automatically integrating diverse relational schemas into a multidatabase

Desirable properties: individual mappings - information sources integrated

one-at-a-time and independently global view constructed for query transparency handles schema conflicts - including semantic,

structural, and naming conflicts automated global integration - global view

constructed efficiently and automatically

Page 11: MDBS Schema Integration: The Relational Integration Model

Page 11

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: The Idea

The idea behind the RIM model is that most (and probably all) schema conflicts can be resolved if we:

eliminate all naming conflicts define a language capable of determining schema

equivalence and performing transformations

With these two properties, schema conflicts can be resolved automatically at the global level

Page 12: MDBS Schema Integration: The Relational Integration Model

Page 12

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: The Plan

The first task is eliminating naming conflicts: use a global thesaurus/dictionary like SSM map local schema names into global counterparts identical concepts can be identified by global name

The integration language must be defined: RIM specifications - records capturing semantics of

each LDBS in a machine-processable form global names captured in RIM specs. to identify

concepts stored in LDBS

Page 13: MDBS Schema Integration: The Relational Integration Model

Page 13

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: The Plan (cont.)

Integrate RIM specifications: To query the MDBS, the client downloads and

integrates only RIM specs. of LDBSs accessed Global view is constructed from RIM specs. by

automatically combining them at client site using global names and semantic metadata they contain

Use of global names allows system to determine identical concepts even though structural representations may be different

Semantic information captured using metadata

Page 14: MDBS Schema Integration: The Relational Integration Model

Page 14

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: The Plan (cont.)

Querying the MDBS: queries are posed to the MDBS through the global

view at each client translation from the GV back to the original RIM

spec. for each LDBS is performed the translated queries are sent to each LDBS which

transforms the query (specified using RIM) into a query for the LDBS

results are returned to the client which integrates them based on its GV

Page 15: MDBS Schema Integration: The Relational Integration Model

Page 15

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: Architecture

RDBS RDBS RDBS RDBS

RIM spec.

RIM spec.

RIM spec.

RIM spec.

RIM Integration RIM Integration

Client Client

Global View Global View

RIM Specifications:• constructed at each RDBS• local concepts mapped to global names• schema can be automatically extracted

RIM Integration:• uses needed RIM specs.• constructs global view• resolves conflicts by:

• identifying concepts using global names• transforming concepts into a form consistent with the global view

Page 16: MDBS Schema Integration: The Relational Integration Model

Page 16

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: Using Global Names

Global names attempt to capture semantics of data and its structure

Research has found that a single dictionary term is insufficient to capture all semantics of a given data item

Current proposed global name term: [context term] [concept name] ([adjective phrases]) [adjective phrase] = [adjective] [preposition]

([context term] or [concept name])

Page 17: MDBS Schema Integration: The Relational Integration Model

Page 17

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: Using Global Names (cont.)

Here a few examples of using global names: the database stores damage claim information

Example 1: attribute of claim is called net_amount in system GN: [Claim] Net Amount

Example 2: attribute of claim is called claim_date in system GN1: [Claim] Claim date (received by system) GN2: [Claim] Claim date (received by company) GN3: [Claim] Claim date (submitted by claimant)

Page 18: MDBS Schema Integration: The Relational Integration Model

Page 18

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: The Global Dictionary

To match concepts across systems, a global dictionary is required. Global names are taken from this dictionary.

Currently developing a simplified on-line dictionary:

stores hierarchy (IS-A) relationships and component (Part-of) relationships

global terms for RIM are taken from the dictionary dictionary will allow user-defined words

Future work involves determining how to add locally defined terms into the dictionary if required

Page 19: MDBS Schema Integration: The Relational Integration Model

Page 19

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: Basic Concepts

There are 3 basic modeling constructs in RIM: entity - a concept whose existence does not depend on

any other entities relationship - a combination of two or more entities

which does not exists without them attribute - a characteristic of an entity or a relationship

All entities and attributes should be identifiable by a global name from the dictionary.

Page 20: MDBS Schema Integration: The Relational Integration Model

Page 20

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: RIM Specifications

A RIM specification consists of two parts: table headers - table-level information for each relation

in database table schemas - information at the attribute level of a

database relation

Most of the information can be automatically extracted, however the DBA must assign global names to local concepts manually

Page 21: MDBS Schema Integration: The Relational Integration Model

Page 21

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: The Table Header

The table header provides table-level information for each relation and has fields:

name - unique table name (local) record size and count foreign key list and foreign key access list record insert/delete/update mechanisms record name - semantic name for a table record record type - entity, relationship instance, ... record grouping - why are records in the table? record distinction/duplicates - primary key table comment

Page 22: MDBS Schema Integration: The Relational Integration Model

Page 22

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: The Table Schema

The table schema contains attribute-level information. Some fields include:

field name - database system name semantic name - global name field use:

attribute, key, categorization, summation, date/time, foreign key, logical, numeric, reference

Page 23: MDBS Schema Integration: The Relational Integration Model

Page 23

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

RIM: Semantic Conflicts

There are 6 basic semantic conflicts in RIM: attribute-entity conflict attribute-relationship conflict entity-relationship conflict entity-entity conflict (not studied) attribute-attribute conflict (not studied) relationship-relationship conflict (not studied)

There is some basic ideas on how to automatically resolve the first 3 conflicts.

Conflict resolution is an area of future work.

Page 24: MDBS Schema Integration: The Relational Integration Model

Page 24

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

Conclusions

Current integration methodologies are insufficient because they rely on manual intervention and do not resolve all types of conflicts

The RIM model may be able to integrate diverse relational schemas using a global dictionary, a systematic method for capturing data semantics, and automated procedures for performing client run-time integration

Page 25: MDBS Schema Integration: The Relational Integration Model

Page 25

MDBS Schema Integration: The Relational Integration ModelRamon Lawrence

Future Work

Determining how the RIM specifications can be constructed and what information can be automatically extracted

Deciding the format for the global dictionary Studying conflict resolution procedures and

testing methodology on simple integration problems