16
Automatic Automatic Data Data Ramon Lawrence University of Manitoba [email protected]

Automatic Data

  • Upload
    onslow

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Automatic Data. Ramon Lawrence University of Manitoba [email protected]. base Integration. Ken Barker University of Calgary [email protected]. Summary. Database integration allows systems within a company or on the WWW to interoperate. - PowerPoint PPT Presentation

Citation preview

Page 1: Automatic Data

Automatic DataAutomatic Data

Ramon LawrenceUniversity of [email protected]

Page 2: Automatic Data

base Integrationbase Integration

Ken BarkerUniversity of Calgary

[email protected]

Page 3: Automatic Data

Summary Database integration allows systems within a

company or on the WWW to interoperate. Current integration solutions require manual

coding which is costly and time-consuming. Our integration architecture uses XML and a

standard dictionary to automate integration. Database systems are integrated using 3

processes: capture, integration, and query. The architecture is implemented in a software

package called Unity. Unity is capable of automatic database

integration and querying using ODBC. Applications include data warehouse design,

web site integration, and systems integration.

Page 4: Automatic Data

Integration Architecture

Architecture Components: 1) Integrated Context View

• user’s view of integration 2) X-Spec Editor

• stores schema & metadata• uses XML

3) Standard Dictionary• terms to express semantics

4) Integration Algorithm• combines X-Specs into integrated context view

5) Query Processor• accepts query on view• determines data source mappings and joins• executes queries and formats results

Local Transactions

X-Spec

X-Spec Editor

Standard Dictionary

Integration Algorithm

Integrated Context View

Query Processor and ODBC Manager

Database

Client

Subtransactions

Client

Multidatabase Layer

Database

X-Spec

Page 5: Automatic Data

The integration architecture consists of three separate processes:

Capture process: independently extracts database schema information and metadata into a XML document called a X-Spec.

Integration process: combines X-Specs into a structurally-neutral hierarchy of database concepts called an integrated context view.

Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL) and the results are integrated and formatted.

Integration Processes

Page 6: Automatic Data

Architecture Components The architecture consists of four components:

A standard dictionary (SD) to capture data semantics SD terms are used to build semantic names describing

semantics of schema elements.

X-Specs for storing data semantics. relational database info. stored and transmitted using XML stores semantic names to describe schema elements

Integration Algorithm Identical concepts in different databases are identified by

similar semantic names. Produces an integrated view of all database concepts.

Query Processor allows the user to formulate queries on the view translates from semantic names in integrated view to SQL

queries and integrates and formats results involves determining correct field and table mappings and discovery of join conditions and join paths

Page 7: Automatic Data

Architecture Discussion The architecture automatically integrates

relational schemas into a multidatabase Desirable properties:

individual mappings - information sources integrated one-at-a-time and independently

integrated view constructed for query transparency - user queries system by semantics instead of structure

handles schema conflicts - including semantic, structural, and naming conflicts

automated integration - integrated view constructed efficiently and automatically

no wrapper or mediator software is required transparent querying - users issue semantic queries

which are translated to SQL by the query processor no changes are performed on underlying databases

Page 8: Automatic Data

Conclusions Automatic database integration is possible by

using a standard term dictionary and defining semantic names for schema elements.

Integration of data sources has applications to the WWW and construction of data warehouses.

Users are able to transparently query integrated systems by concept instead of structure.

Future Work: Test the system in large industrial projects. Improve the Unity implementation and develop an

integration component for a web browser. Increase the efficiency of the query processor. Allow distributed updates and global updates on all

databases.

Page 9: Automatic Data

Background &Background &

MotivationMotivation

Page 10: Automatic Data

Why is Integration Required? There are many integration environments:

operational systems within an organization system integration during company merger data warehouses, Intranets, and the WWW

Users require information from many data sources which often do not work together.

Companies require a global view of their entire operations which may be present in numerous operational databases for different departments and distributed geographically.

Business-to-Business (B-to-B) and Business-to-Consumer (B-to-C) E-commerce demand integration of web databases with production systems for efficiency and productivity.

Page 11: Automatic Data

What is Integration? Database integration requires conflict resolution

during schema integration. A schema is a description of the data

organization and format in a database. Schema integration combines database schemas

into a unified view of the data. Constructing an integrated view is difficult because

databases will store different types of data, in varying formats, with different meanings, and reference it using different names. Subsequently, the construction of the integrated view must handle the different mechanisms for storing data (structural conflicts), for referencing data (naming conflicts), and for attributing meaning to the data (semantic conflicts). Data is integrated at both the schema level (the description of the data) and the data level (individual data instances).

Page 12: Automatic Data

What is the Current Solution? SQL and multidatabase query languages:

querying is difficult for numerous databases

Research systems: do not use standardization so cannot be automated

Industrial standards: XML, BizTalk, E-commerce portals apply to limited domains/industries require standard structures and database changes

Current techniques for database integration have some of these problems:

require integrator to understand all databases integration process is manual do not hide system complexity from the user force changes on the existing database systems

Page 13: Automatic Data

UnityUnity

SoftwareSoftware

Page 14: Automatic Data

Unity Overview Unity is a software package that implements the

integration architecture with a GUI. Developed using Microsoft Visual C++ 6 and

Microsoft Foundation Classes (MFC). Unity allows the user to:

construct and modify standard dictionaries build X-Specs to describe data sources integrate X-Specs into an integrated view transparently query integrated systems using ODBC

and automatically generate SQL transactions

A screen-shot of Unity editing a standard dictionary is below.

Unity is available for demonstration and distribution.

Page 15: Automatic Data
Page 16: Automatic Data

References Publications:

Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, January 2000.

Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pages 127-136, Oct. 2000.

Integrating Relational Database Schemas using a Standardized Dictionary, To appear in SAC’2001 - ACM Symposium on Applied Computing, March, 2001.

Sponsors: NSERC, TRLabs

Further Information: http://www.cs.umanitoba.ca/~umlawren/