TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML DTD
A Conceptual and Generic Approach
Philippe ThiranComputer Science Department
Technische Universiteit EindhovenThe Netherlands
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML• Current Situation
– XML as the standard for publishing and exchanging data over the Web
– Data recorded and maintained in existing Databases• Heterogeneous databases: different data models• Limitation of database models
– Database schema incompleteness (implicit/hidden structures)
– Explicit and implicit interconnections among entities
(no primary and foreign keys)
Oracle V5 Model
ProductReferenceLabel[0-1]UnitPriceSupplier
OrderOderIDCustomerDateTotal[0-1]
DetailOderIDReferenceQuantityAmount
ProductReferenceLabel[0-1]UnitPriceSupplier[1-5]id: Reference
OrderOderIDCustomerDateTotal[0-1]id: OderID
DetailOderIDReferenceQuantityAmountid: Reference
OderIDref: Referenceref: OderID
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML• Migrating existing databases to
XML– Principle
• XML description in DTD • Bottom-up Approach• Exploiting as much as possible the
meaning of source data– Method and Tool
• Method – Not limited to any specific database model– Capturing the explicit and implicit structures
and interconnections of the database schema• Tool for supporting the method
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
Schema RepresentationDatabase models and DTD
Schema Manipulation Database schemas and DTD
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• Schema Representation– Expressing database schemas and
XML in terms of GER• Extended object-entity relationship
data model• One rich and expressive model able to
express data schemas whatever their operational data models– Operational database models like IMS,
Relational, OO– XML-family models: XML DTD or XML
Schema
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• Schema Representation– Expressing XML in terms of GER
• DTD expressed in terms of GER– DTD concepts– Hierarchical organization– Sequence organization
DTD Concepts GER Interpretation
Element types Entity types
Hierarchy of element types (root) entity types, relationship types, father roles
Content type ELEMENT Relationship types
Sequence organization (order of elements in the sequence)
Seq groups
Occurrence operators on sub-elements ?, *, +
Role Cardinalities
IDREF, GID attributes IDREF, GID groups
Attribute modifiers Attribute cardinalities
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• Schema Representation– Expressing XML in terms of GER
1-1
1-1f
1-11-1f
1-1
1-Nf
1-1
0-1f
1-11-1f
1-1
1-1f
1-1
0-Nf
1-10-Nf
1-1
1-Nf
Amount#pcdata
Quantity#pcdata
Supplier#any
DetailProductidref: Productseq: .Quantity
.Amount
ProductReferenceLabel[0-1]UnitPricegid: Referenceseq: .Supplier[*]
Total#pcdata
Date#pcdata
Customer#anyOrder
OderIDseq: .Customer
.Date
.Total
.Detail[*]gid: OderID
Catalog
seq: .Order[*].Product[*]
<!ELEMENT Catalog (Order*, Product*)><!ELEMENT Order (Customer, Date, Total?, detail+)><!ATTLIST Order OrderID ID #REQUIRED><!ELEMENT Customer ANY><!ELEMENT Date (#PCDATA)><!ELEMENT Total (#PCDATA)><!ELEMENT Detail (Quantity, Amount)><!ATTLIST Detail Product IDREF #REQUIRED><!ELEMENT Quantity (#PCDATA)><!ELEMENT Amount (#PCDATA)><!ELEMENT Product (Supplier+)><!ATTLIST Product Reference ID #REQUIRED Label CDATA #IMPLIED UnitPrice CDATA #REQUIRED><!ELEMENT Supplier ANY>
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML• Schema Manipulation
– Transforming XML DTD within GER• Schema transformations defined on GER
– Reverse transformations, semantics-preserving transformations
– Transformation operators• Standard transformations
– For manipulating schemas expressed in operational database models
– Example: transforming an entity type into an attribute
• DTD-specific transformations
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• Schema Manipulation– Transforming XML DTD within GER
• Standard transformations– For manipulating schemas expressed in
classical structured models – Example of a semantics-preserving
transformation: transforming an relationship type into a entity type
RT-ET: Transforming a relationship type into an entity type.
Inverse: ET-RT0-N0-N R
B1B1
AA1
1-1
0-N
rB1-1
0-N
rAR
id: rB.B1rA.A
B1B1
AA1
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• Schema Manipulation– Transforming XML DTD within GER
• DTD-specific transformations (example)– Suited to derive a DTD from a structured data
schemaDTD-RT-to-HIER: Transforms a one-to-many (or one-to-one) binary relationship type into a hierarchical relation. The 1-1 role becomes the child role.
Inverse: DTD-HIER-to-RT
Create-SEQ-GROUP: Adds a seq group to an entity type. That group contains the child roles played by its children (in an aleatory order).
Inverse: Del-SEQ-GROUP
1-10-N R BA 1-1f
0-NR BA
1-1
f0-1
R2
1-1f
0-NR1R
B
A
1-1
f0-1
R2
1-1f
0-NR1
R
seq: R1.A[*]R2.B
B
A
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
Converting (legacy) databases into DTDExploiting as much as possible the meaning of source dataCapturing the explicit and implicit structures and interconnections
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML• Exporting Databases
– Bottom-up approach (from the source to the target)
– Semi-automated 4-step method• Extraction of the database schema (automated)
– Extraction of the explicit structures and constraints• Semantics recovering (semi-automated)
– Recovery of the implicit structures and constraints• Model translation (semi-automated)
– Translation of a schema expressed in the GER into a schema expressed in the GER DTD
– Use of the relations among entities• DTD exportation (automated)
– Generation of the DTD document
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• Exporting XML – Reverse Engineering
• Recovering of the conceptual schema of an existing database
– Augmentation of the knowledge about the data semantics
– Database reverse engineering process (DB-MAIN)
– Elicitation of hidden structures and constraints
0-N1-NDetail
QuantityAmount
ProductReferenceLabel[0-1]UnitPriceSupplier[1-5]id: Reference
OrderOderIDCustomerDateTotal[0-1]id: OderID
Database Schema
DetailOderIDReferenceQuantityAmountacc: Reference
OderID
ProductReferenceLabel[0-1]UnitPriceSupplieracc: Reference
OrderOderIDCustomerDateTotal[0-1]acc: OderID
FileCatalog
ProductOrderDetail
Schema transformations
Conceptual Schema
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML• Exporting XML
– Model Translation• DTD-specific transformation• Non-deterministic process
– It requires some design choices– The user-inputs might have consequences on
the properties and the semantics of the resulting schema
• 5-step transformation process– Schema preparation– Hierarchy structure creation– Constraint relaxation– Attribute representation– Ordering definition
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• Exporting XML – Model Translation
• Schema preparation– Removing invalid constructs
» Multivalued/compound attributes» Complex relationship types
0-N1-NDetail
QuantityAmount
ProductReferenceLabel[0-1]UnitPriceSupplier[1-5]id: Reference
OrderOderIDCustomerDateTotal[0-1]id: OderID
Conceptual Schema
1-1
1-5
supplied
1-1
0-Nof
1-1
1-N consists
SupplierSupplierid: supplied.Product
Supplier
ProductReferenceLabel[0-1]UnitPriceid: Reference
OrderOderIDCustomerDateTotal[0-1]id: OderID
DetailQuantityAmountid: of.Product
consists.Order
1. Schema preparation2. Hierarchy structure creation3. Constraint relaxation4. Attribute representation5. Ordering definition
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML• Exporting XML
– Model Translation• Hierarchical structure creation
1-1
1-5
supplied
1-1
0-Nof
1-1
1-N consists
SupplierSupplierid: supplied.Product
Supplier
ProductReferenceLabel[0-1]UnitPriceid: Reference
OrderOderIDCustomerDateTotal[0-1]id: OderID
DetailQuantityAmountid: of.Product
consists.Order
1-1
1-5f
1-10-Nf
1-1
0-Nf
1-11-Nf
SupplierSupplierid: .f
Supplier
ProductReferenceLabel[0-1]UnitPriceid: Reference
Catalog
OrderOderIDCustomerDateTotal[0-1]id: OderID
DetailReferenceQuantityAmountid: Reference
.fref: Reference
Entity types, relationship types are transformed into a tree• by electing natural roots (significant concepts)• by resolving father conflicts• by breaking cycles• by (eventually) adding a unique root
1. Schema preparation2. Hierarchy structure creation3. Constraint relaxation4. Attribute representation5. Ordering definition
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• Exporting XML – Model Translation
• Constraint relaxation– Role cardinalities extension– Gid and idref groups creation
1-1
1-Nf
1-10-Nf
1-1
0-Nf
1-11-Nf
SupplierSuppliergid: .f
Supplier
ProductReferenceLabel[0-1]UnitPricegid: Reference
Catalog
OrderOderIDCustomerDateTotal[0-1]gid: OderID
DetailReferenceQuantityAmountid: Reference
.fidref: Reference
1. Schema preparation2. Hierarchy structure creation3. Constraint relaxation4. Attribute representation5. Ordering definition
1-1
1-5f
1-10-Nf
1-1
0-Nf
1-11-Nf
SupplierSupplierid: .f
Supplier
ProductReferenceLabel[0-1]UnitPriceid: Reference
Catalog
OrderOderIDCustomerDateTotal[0-1]id: OderID
DetailReferenceQuantityAmountid: Reference
.fref: Reference
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• Exporting XML – Model Translation
1-1
1-1f
1-11-1f
1-1
1-Nf
1-1
0-1f
1-11-1f
1-1
1-1f
1-1
0-Nf
1-10-Nf
1-1
1-Nf
Amount#pcdata
Quantity#pcdata
Supplier#any
DetailProductidref: Productseq: .Quantity
.Amount
ProductReferenceLabel[0-1]UnitPricegid: Referenceseq: .Supplier[*]
Total#pcdata
Date#pcdata
Customer#anyOrder
OderIDseq: .Customer
.Date
.Total
.Detail[*]gid: OderID
Catalog
seq: .Order[*].Product[*]
1. Schema preparation2. Hierarchy structure creation3. Constraint relaxation4. Attribute representation5. Ordering definition
• Attribute representation• Ordering definition
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
CASE Support – DB-MAINModel Expression
Database models and DTD Model Translation
DTD-specific transformation
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• CASE Support – DB-MAIN– Basic Features
• Dedicated to database application engineering
• Based on the GER
• Includes transformation operators, reverse engineering processors and schema analysis tools
• Extraction facilities (SQL, Codasyl, RPG, IMS, etc.)
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• CASE Support– *-to-DTD Transformation
• DTD-Specific transformations• Assistant
TU/e eindhoven university of technology
/faculty of mathematics and informatics
Exporting Databases in XML
• Conclusions– Rich and expressive data model
• Translating semantics of both database and XML models
– Non-deterministic aspect of the model translation• The same database schema can lead to a large set of equivalent XML
structures
– CASE Support (application)• Automatic production of XML documents
– that comply with the DTD that has been computed– based on the schema transformations used to convert the database
schema in XML DTD