23
IBM Research ® Schema Advisor for Hybrid Schema Advisor for Hybrid Relational/XML DBMS Relational/XML DBMS Mirella Mirella Moro, Moro, Universidade Federal do Rio Grande do Sul Universidade Federal do Rio Grande do Sul Lipyeow Lipyeow Lim, Lim, IBM T J Watson IBM T J Watson Yuan Yuan - - chi Chang, chi Chang, IBM T J Watson IBM T J Watson

Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

®

Schema Advisor for Hybrid Schema Advisor for Hybrid Relational/XML DBMSRelational/XML DBMS

MirellaMirella Moro, Moro, Universidade Federal do Rio Grande do SulUniversidade Federal do Rio Grande do SulLipyeowLipyeow Lim, Lim, IBM T J WatsonIBM T J WatsonYuanYuan--chi Chang, chi Chang, IBM T J WatsonIBM T J Watson

Page 2: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

2 Hybrid Relational/XML DB Design - Mirella M Moro

MOTIVATION MOTIVATION -- DB2 Pure XMLDB2 Pure XML

�DB2 stores XML in parsed hierarchical format

CREATE TABLE dept (deptID char(8),…, deptdoc xml);

� Relational columnsare stored in relationalformat (tables)

� XML is stored nativelyas type-annotated trees

……

<dept> …<emp>…</emp>

</dept>

“PR27”

deptdocdeptID

DB2 Storage

Page 3: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

3 Hybrid Relational/XML DB Design - Mirella M Moro

<Message xmlns="http://www.x.com/swaps/gcd/aurora"xmlns:fpml="http://www.fpml.org/2004/FpML-4-1"xmlns:aur="http://www.x.com/swaps/gcd/aurora"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0-1"xsi:schemaLocation="http://www.fpml.org/2004...">

</Message>

Financial Company MFinancial Company M� Trades financial derivatives� Derivatives contract represented in

FpML (Financial products Markup Language) for inter-company messaging

� For persistance, the FpML is augmented with proprietary schema

� FpML schema changes every few weeks

<header>

<dtcc><activity>New</activity><status>Submit</status><transactionType>Trade</transactionType>

</dtcc></header>

<fpmlExtensions id="trade_100089-1">

<trade><tradeId

tradeIdScheme="http://www.swapswire.com/spec/2001/trade-id-1-0">

100089-1</tradeId>....

</trade></fpmlExtensions>

<FpML version="4-1" xsi:type="RequestTradeConfirmation"xmlns="http://www.fpml.org/2004/FpML-4-1">

<header><messageId

messageIdScheme="http://www.x.com/gcd/fpml/messageId">MLGCD148718</messageId>...

</header><trade>

....</trade><party id="party_cpty_7837"> ... </party><party id="party_book_18589"> ... </party>

</FpML>

•Inter-company FpML message•versioned, no updates•Certain fields heavily queried by GUI apps

•Internal proprietary extension •Replicates part of FpML message•Stores operational metadata eg. Who updated the trade etc.•Has updates

•Internal proprietary extension •Stores coarse-grain state info•Has updates

Page 4: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

4 Hybrid Relational/XML DB Design - Mirella M Moro

The Question Is The Question Is ……

�How do we design the database schema for company M that leverages both relational & XML capabilities of modern DBMS ?

XML data

Relationaldata

Relational & XML data

R-X Hybrid Database Schema

?

?

?

Logical DataModel

Relational-XMLSchema Advisor

Page 5: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

5 Hybrid Relational/XML DB Design - Mirella M Moro

What affects the schema design ?What affects the schema design ?

� Granularity of access/reuse�Business artifacts – grouping of data elements that are accessed as a single unit. Eg. PO, contracts, forms�XML messages – grouping of data elements into messages that are sent/received

� Schema variability�A lot of flexibility in structure of the data.�Eg. Sparse data, Optional attributes, composite fields

� Schema evolution�Structure of data changes over time�Eg. FpML format changes every 6 months

� Data versioning�Content changes over time, but changes needs to be tracked.�If no explicit DBMS support, then versioning IDs need to be included in schema

� Performance criteria�Depends on performance characteristics of DBMS�Relational columns more high performing than XML CLOBs, BLOBs, & native.

� Usability�How easy it is to write the queries on the resultant schema?

� Storage redundancy�normalization

Page 6: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

6 Hybrid Relational/XML DB Design - Mirella M Moro

Schema VariabilitySchema Variability

� E-catalog: product

(eg. department stores sell from tshirts to TVs)

� Table + columns designs

PROD (id, price, size, color, fabric, weight, screensize, stereo …)

Flat modelSimple, NULL

PROD (id, price)

TSHIRT(size, color, fabric, FK to PROD)

TV(weight, screensize, stereo, FK to PROD)

CategoriesComplex, No NULLJoin

PROD (id, attribName, attribValue) VerticalSimple, joins

PROD (id, price, XMLdescription) XMLSimple, flexible, no null, no joinsrequired sparse, optional

Page 7: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

7 Hybrid Relational/XML DB Design - Mirella M Moro

Schema EvolutionSchema Evolution

� Consider financial company persisting FpML messages.

� Current solutions shred the XML data to relational tables

� When FpML.orgreleases a new schema version�A new set of relational tables for the shredded XML�Re-shred the existing FpML in the DBMS

FpML v4.1 FpML v4.1+v4.2

Schema evolves

Evolving relational schemasis expensive!

Page 8: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

8 Hybrid Relational/XML DB Design - Mirella M Moro

ReXSAReXSA : : ReRelationallational --XXML ML SSchema chema AAdvisordvisor

� Input : a logical data model annotated with information on granularity, schema variability and evolution, versioning, performance criteria.

� Outputs: candidate relational-XML schema designs� Overview: 2 phases

� Phase 1 analyses the annotated LDM to partition entities into relational or XML types

� Phase 2 transforms the partitioned LDM into table definitions and/or XML schemas.

DDL

XSDLDM R-X Partitioning

analysis transform

annotated

Page 9: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

9 Hybrid Relational/XML DB Design - Mirella M Moro

Logical Data ModelLogical Data Model

� Use extended entity-relationship model (UML will also work)

� Entities, relationships, attributes, hierarchies.

required

attributeoptional

attribute

KEY

composite attribute

multivalued attribute

Person Dept

entityR1

relationship

Classes

R5

recursiverelationship

R3

R4

multirelationship

Faculty Student

Lecturer Professor Undergrad

Graduate

Master PhD

hierarchy ApplicantDependents R2

weak-entityidentifying

relationship

business object or

document

RefLetterFrom: Q1:Q2:Q3:Q4:

Page 10: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

10 Hybrid Relational/XML DB Design - Mirella M Moro

RelationalRelational --XML PartitioningXML PartitioningWhich is relational? Which is XML?

� Entities that benefit from XML:� Optional attributes� Multi-valued attributes. Eg. Phone number

(h/c/o)� Composite attributes. Eg. Name : first, mi, last� Weak entities� Business artifacts. Eg. reference letters� Frequently evolving schema

� Entities that benefit from relational� Rigid & stable schema� Performance critical elements

FOR EACH ENTITY

1. Compute a score of

the entity based on flexibility

2. Label entity as Relational or XML based on user-specified

threshold.

LDM R-X Partitioning

analysis

Page 11: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

11 Hybrid Relational/XML DB Design - Mirella M Moro

Score (Person):(1 op + 1 mlt + 3 cmp + 1 doc) = 55%

9 total

initial suggestion

Score exampleScore example

R1

DependentsR2

SSN

maritalstatus

ID Name

first mi last

phone

PersonDept Classes

R5

requisite

R3

R4

offers

DOB

ResumeNameContactEducation

PhDMastersBachelor

PublicationsProfessional Actvs… location

R1

DependentsR2

Dept Classes

R5

requisite

R3

R4

offers

location

FOR EACH ENTITY: PersonRequired: SSN, DOB, IDOptional: marital statusMulti-valued: phoneComposed: name (first, mi, last)Document(s): resume

Page 12: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

12 Hybrid Relational/XML DB Design - Mirella M Moro

Transforming Partitioned LDM to SchemaTransforming Partitioned LDM to Schema

� Entities in LDM have been labeled as relational or XML

� Transform LDM to database schema� Examine entities, relationships� Hierarchies can be tricky� Read the paper for transformation

rules� A few examples presented next

foreach entity

1. transform entity to table definition and/or XML schema

foreach relationship

1. transform relationship to table definition or modify table for entity

2. add key constraints

foreach hierarchy

1. transform entity to table

definition and/or XML schema

2. add key constraints

DDL

XSDR-X Partitioning

transform

Page 13: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

13 Hybrid Relational/XML DB Design - Mirella M Moro

Transforming EntitiesTransforming Entities

SSN

maritalstatus

ID

Name

first mi last

phone

Person

DOB

ResumeNameContactEducation

PhDMastersBachelor

PublicationsProfessional Actvs…

Pure Relational : PERSONID int PK NOT NULLSSN varchar NOT NULLDOB date NOT NULLmarSt charfirstN varcharmi charlastN varcharphone FK to phoneTableresume FK to resumeTable

Hybrid 1 : PERSONID int PK NOT NULLSSN varchar NOT NULLDOB date NOT NULLmarSt charname XML TYPEphone XML TYPEresume XML TYPE

Hybrid 2 : PERSONID int PK NOT NULLSSN varchar NOT NULLDOB date NOT NULLInfo XML TYPE

Hybrid 3 : PERSONID int PK NOT NULLInfo XML TYPE

Page 14: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

14 Hybrid Relational/XML DB Design - Mirella M Moro

Transforming RelationshipsTransforming Relationships

Page 15: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

15 Hybrid Relational/XML DB Design - Mirella M Moro

Transforming RelationshipsTransforming Relationships : Example: Example

eXML eRELR1(0..1) 1

eREL (atrib0 datatype0, atrib1 datatype1, … , eXML XMLtype )

eXML eRELR1N 1

eREL (atrib0 datatype0, atrib1 datatype1, …, eXML XMLtype ) -- concatenate

eREL (atrib0 datatype0, … , atribn datatypen)

eXML(atrib0 datatype0, atrib1 datatype1, xmldata XMLtype )

Read only

Updates

Query workload

Table specification

Page 16: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

16 Hybrid Relational/XML DB Design - Mirella M Moro

Transforming Hierarchies and InheritanceTransforming Hierarchies and Inheritance

R1

Dependents R2

DeptPerson

Faculty Student

Lecturer Professor Undergrad Graduate

Master PhD

SSN

maritalstatus

ID

Name

first mi last

phoneDOB

ResumeNameContactEducation

PhDMastersBachelor

PublicationsProfessional Actvs…

R3 Thesis

GradDate

Defense Date

TextTitleAreaKeywordsAbstractChapters…

1st

Quarter

Person

Faculty Student

Lecturer Professor Undergrad Graduate

Master PhD

Page 17: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

17 Hybrid Relational/XML DB Design - Mirella M Moro

Person

Faculty Student

PERSON (id, SSN, DOB, marStatus, firstN, mi, lastN, phone FK)FACULTY (id, personID FK, resume FK)

STUDENT (id, personID FK, gradDate, firstQuarter)

Hierarchies : Relational SchemasHierarchies : Relational Schemas

(A)

FACULTY (id, SSN, DOB, marStatus, firstN, mi, lastN, phone FK, resume FK)

STUDENT (id, SSN, DOB, marStatus, firstN, mi, lastN, phone FK, gradDate, firstQuarter)

(total/disjoint inheritance: each person must be either faculty or student)(B)

PERSON (id, SSN, DOB, marStatus, firstN, mi, lastN, phone FK, type,resume FK,gradDate, firstQuarter)

(disjoint inheritance: each person is either faculty or student + not many specialized attributes)(C)

PERSON (id, SSN, DOB, marStatus, firstN, mi, lastN, phone FK,

Fflag, resume FK,Sflag, gradDate, firstQuarter)

(overlapping inheritance: each person may be faculty, student, or both)(D)

Page 18: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

18 Hybrid Relational/XML DB Design - Mirella M Moro

Hierarchies: Example 1Hierarchies: Example 1

R1Person Dept

Student

Graduate

PhD

SSN

maritalstatus

ID Name

first mi last

phoneDOB

R3 Thesis

GradDateDefense

Date

TextTitleAreaKeywordsAbstractChapters…

1st

Quarter

<Person id=“3d01” dept=“dept001” ><SSN>…</SSN> <DOB>…</DOB><maritalSt>…</maritalSt><name>

<first>…</first><last>…</last>

</name><phones>

<phone>…</phone><phone>…</phone><phone>…</phone>

</phones><Student>

<firstQuarter>…</firstQuarter><gradDate>…</gradDate><Graduate>

<Thesis><DefenseDate>… </DefenseDate><Text>

<Title>…</Title> …</Text>

</Thesis><PHD>

…</PHD>

</Graduate></Student>

</Person>

Page 19: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

19 Hybrid Relational/XML DB Design - Mirella M Moro

Hierarchies : Example 2Hierarchies : Example 2

R1Person Dept

Student

Graduate

PhD

SSN

maritalstatus

ID Name

first mi last

phoneDOB

R3 Thesis

GradDateDefense

Date

TextTitleAreaKeywordsAbstractChapters…

1st

Quarter

PERSON (id, info XML)

THESIS (tid, …, personID FK)

PERSON (id, info XML, dept FK)DEPT (did, …)

PERSON (id, info XML)

THESIS (tid, …, personID FK, path)

Superclass relationship

<Person id=“3d01” dept=“dept001” >

Subclass relationship

<Person >…<Graduate>

<Thesis> …

Page 20: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

20 Hybrid Relational/XML DB Design - Mirella M Moro

Artificial Artificial UsecaseUsecase

� Artificial academic LDM

� Set R-X partitioning threshold to 70%

� DDLs look very reasonable

Page 21: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

21 Hybrid Relational/XML DB Design - Mirella M Moro

HL7 Reference Information ModelHL7 Reference Information Model

� Health Level 7 (HL7) is an XML messaging format for healthcare industry.

� HL7 provides a conceptual model called the Reference Information Model (RIM)

� ReXSA suggests one table with an XML column

� Reasonable because all entities in RIM inherits from a single super-entity.

Page 22: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

22 Hybrid Relational/XML DB Design - Mirella M Moro

ConclusionConclusion

� DBMS users don’t really know what to put in XML columns and what to put in relational columns.

� Designing hybrid relational-XML schemas is a problem that has not been addressed before.

� We presented a schema design advisor that takes an annotated logical data model as input and outputs candidate schema design(s)

� Future work�Design evaluation. Is the resultant design a good design?� Incorporate data samples in the analysis� Integration with performance (index, MQT) advisors

Page 23: Schema Advisor for Hybrid Relational/XML DBMS · IBM Research ® Schema Advisor for Hybrid Relational/XML DBMS Mirella Moro, Universidade Federal do Rio Grande do Sul Lipyeow Lim,

IBM Research

23 Hybrid Relational/XML DB Design - Mirella M Moro

QUESTIONS?QUESTIONS?

�Contacts �[email protected][email protected]