Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
IBM Research
®
Schema Advisor for Hybrid Schema Advisor for Hybrid Relational/XML DBMSRelational/XML DBMS
MirellaMirella Moro, Moro, Universidade Federal do Rio Grande do SulUniversidade Federal do Rio Grande do SulLipyeowLipyeow Lim, Lim, IBM T J WatsonIBM T J WatsonYuanYuan--chi Chang, chi Chang, IBM T J WatsonIBM T J Watson
IBM Research
2 Hybrid Relational/XML DB Design - Mirella M Moro
MOTIVATION MOTIVATION -- DB2 Pure XMLDB2 Pure XML
�DB2 stores XML in parsed hierarchical format
CREATE TABLE dept (deptID char(8),…, deptdoc xml);
� Relational columnsare stored in relationalformat (tables)
� XML is stored nativelyas type-annotated trees
…
…
…
……
<dept> …<emp>…</emp>
</dept>
“PR27”
deptdocdeptID
DB2 Storage
IBM Research
3 Hybrid Relational/XML DB Design - Mirella M Moro
<Message xmlns="http://www.x.com/swaps/gcd/aurora"xmlns:fpml="http://www.fpml.org/2004/FpML-4-1"xmlns:aur="http://www.x.com/swaps/gcd/aurora"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0-1"xsi:schemaLocation="http://www.fpml.org/2004...">
</Message>
Financial Company MFinancial Company M� Trades financial derivatives� Derivatives contract represented in
FpML (Financial products Markup Language) for inter-company messaging
� For persistance, the FpML is augmented with proprietary schema
� FpML schema changes every few weeks
<header>
<dtcc><activity>New</activity><status>Submit</status><transactionType>Trade</transactionType>
</dtcc></header>
<fpmlExtensions id="trade_100089-1">
<trade><tradeId
tradeIdScheme="http://www.swapswire.com/spec/2001/trade-id-1-0">
100089-1</tradeId>....
</trade></fpmlExtensions>
<FpML version="4-1" xsi:type="RequestTradeConfirmation"xmlns="http://www.fpml.org/2004/FpML-4-1">
<header><messageId
messageIdScheme="http://www.x.com/gcd/fpml/messageId">MLGCD148718</messageId>...
</header><trade>
....</trade><party id="party_cpty_7837"> ... </party><party id="party_book_18589"> ... </party>
</FpML>
•Inter-company FpML message•versioned, no updates•Certain fields heavily queried by GUI apps
•Internal proprietary extension •Replicates part of FpML message•Stores operational metadata eg. Who updated the trade etc.•Has updates
•Internal proprietary extension •Stores coarse-grain state info•Has updates
IBM Research
4 Hybrid Relational/XML DB Design - Mirella M Moro
The Question Is The Question Is ……
�How do we design the database schema for company M that leverages both relational & XML capabilities of modern DBMS ?
XML data
Relationaldata
Relational & XML data
R-X Hybrid Database Schema
?
?
?
Logical DataModel
Relational-XMLSchema Advisor
IBM Research
5 Hybrid Relational/XML DB Design - Mirella M Moro
What affects the schema design ?What affects the schema design ?
� Granularity of access/reuse�Business artifacts – grouping of data elements that are accessed as a single unit. Eg. PO, contracts, forms�XML messages – grouping of data elements into messages that are sent/received
� Schema variability�A lot of flexibility in structure of the data.�Eg. Sparse data, Optional attributes, composite fields
� Schema evolution�Structure of data changes over time�Eg. FpML format changes every 6 months
� Data versioning�Content changes over time, but changes needs to be tracked.�If no explicit DBMS support, then versioning IDs need to be included in schema
� Performance criteria�Depends on performance characteristics of DBMS�Relational columns more high performing than XML CLOBs, BLOBs, & native.
� Usability�How easy it is to write the queries on the resultant schema?
� Storage redundancy�normalization
IBM Research
6 Hybrid Relational/XML DB Design - Mirella M Moro
Schema VariabilitySchema Variability
� E-catalog: product
(eg. department stores sell from tshirts to TVs)
� Table + columns designs
PROD (id, price, size, color, fabric, weight, screensize, stereo …)
Flat modelSimple, NULL
PROD (id, price)
TSHIRT(size, color, fabric, FK to PROD)
TV(weight, screensize, stereo, FK to PROD)
CategoriesComplex, No NULLJoin
PROD (id, attribName, attribValue) VerticalSimple, joins
PROD (id, price, XMLdescription) XMLSimple, flexible, no null, no joinsrequired sparse, optional
IBM Research
7 Hybrid Relational/XML DB Design - Mirella M Moro
Schema EvolutionSchema Evolution
� Consider financial company persisting FpML messages.
� Current solutions shred the XML data to relational tables
� When FpML.orgreleases a new schema version�A new set of relational tables for the shredded XML�Re-shred the existing FpML in the DBMS
FpML v4.1 FpML v4.1+v4.2
Schema evolves
Evolving relational schemasis expensive!
IBM Research
8 Hybrid Relational/XML DB Design - Mirella M Moro
ReXSAReXSA : : ReRelationallational --XXML ML SSchema chema AAdvisordvisor
� Input : a logical data model annotated with information on granularity, schema variability and evolution, versioning, performance criteria.
� Outputs: candidate relational-XML schema designs� Overview: 2 phases
� Phase 1 analyses the annotated LDM to partition entities into relational or XML types
� Phase 2 transforms the partitioned LDM into table definitions and/or XML schemas.
DDL
XSDLDM R-X Partitioning
analysis transform
annotated
IBM Research
9 Hybrid Relational/XML DB Design - Mirella M Moro
Logical Data ModelLogical Data Model
� Use extended entity-relationship model (UML will also work)
� Entities, relationships, attributes, hierarchies.
required
attributeoptional
attribute
KEY
composite attribute
multivalued attribute
Person Dept
entityR1
relationship
Classes
R5
recursiverelationship
R3
R4
multirelationship
Faculty Student
Lecturer Professor Undergrad
Graduate
Master PhD
hierarchy ApplicantDependents R2
weak-entityidentifying
relationship
business object or
document
RefLetterFrom: Q1:Q2:Q3:Q4:
IBM Research
10 Hybrid Relational/XML DB Design - Mirella M Moro
RelationalRelational --XML PartitioningXML PartitioningWhich is relational? Which is XML?
� Entities that benefit from XML:� Optional attributes� Multi-valued attributes. Eg. Phone number
(h/c/o)� Composite attributes. Eg. Name : first, mi, last� Weak entities� Business artifacts. Eg. reference letters� Frequently evolving schema
� Entities that benefit from relational� Rigid & stable schema� Performance critical elements
FOR EACH ENTITY
1. Compute a score of
the entity based on flexibility
2. Label entity as Relational or XML based on user-specified
threshold.
LDM R-X Partitioning
analysis
IBM Research
11 Hybrid Relational/XML DB Design - Mirella M Moro
Score (Person):(1 op + 1 mlt + 3 cmp + 1 doc) = 55%
9 total
initial suggestion
Score exampleScore example
R1
DependentsR2
SSN
maritalstatus
ID Name
first mi last
phone
PersonDept Classes
R5
requisite
R3
R4
offers
DOB
ResumeNameContactEducation
PhDMastersBachelor
PublicationsProfessional Actvs… location
R1
DependentsR2
Dept Classes
R5
requisite
R3
R4
offers
location
FOR EACH ENTITY: PersonRequired: SSN, DOB, IDOptional: marital statusMulti-valued: phoneComposed: name (first, mi, last)Document(s): resume
IBM Research
12 Hybrid Relational/XML DB Design - Mirella M Moro
Transforming Partitioned LDM to SchemaTransforming Partitioned LDM to Schema
� Entities in LDM have been labeled as relational or XML
� Transform LDM to database schema� Examine entities, relationships� Hierarchies can be tricky� Read the paper for transformation
rules� A few examples presented next
foreach entity
1. transform entity to table definition and/or XML schema
foreach relationship
1. transform relationship to table definition or modify table for entity
2. add key constraints
foreach hierarchy
1. transform entity to table
definition and/or XML schema
2. add key constraints
DDL
XSDR-X Partitioning
transform
IBM Research
13 Hybrid Relational/XML DB Design - Mirella M Moro
Transforming EntitiesTransforming Entities
SSN
maritalstatus
ID
Name
first mi last
phone
Person
DOB
ResumeNameContactEducation
PhDMastersBachelor
PublicationsProfessional Actvs…
Pure Relational : PERSONID int PK NOT NULLSSN varchar NOT NULLDOB date NOT NULLmarSt charfirstN varcharmi charlastN varcharphone FK to phoneTableresume FK to resumeTable
Hybrid 1 : PERSONID int PK NOT NULLSSN varchar NOT NULLDOB date NOT NULLmarSt charname XML TYPEphone XML TYPEresume XML TYPE
Hybrid 2 : PERSONID int PK NOT NULLSSN varchar NOT NULLDOB date NOT NULLInfo XML TYPE
Hybrid 3 : PERSONID int PK NOT NULLInfo XML TYPE
IBM Research
14 Hybrid Relational/XML DB Design - Mirella M Moro
Transforming RelationshipsTransforming Relationships
IBM Research
15 Hybrid Relational/XML DB Design - Mirella M Moro
Transforming RelationshipsTransforming Relationships : Example: Example
eXML eRELR1(0..1) 1
eREL (atrib0 datatype0, atrib1 datatype1, … , eXML XMLtype )
eXML eRELR1N 1
eREL (atrib0 datatype0, atrib1 datatype1, …, eXML XMLtype ) -- concatenate
eREL (atrib0 datatype0, … , atribn datatypen)
eXML(atrib0 datatype0, atrib1 datatype1, xmldata XMLtype )
Read only
Updates
Query workload
Table specification
IBM Research
16 Hybrid Relational/XML DB Design - Mirella M Moro
Transforming Hierarchies and InheritanceTransforming Hierarchies and Inheritance
R1
Dependents R2
DeptPerson
Faculty Student
Lecturer Professor Undergrad Graduate
Master PhD
SSN
maritalstatus
ID
Name
first mi last
phoneDOB
ResumeNameContactEducation
PhDMastersBachelor
PublicationsProfessional Actvs…
R3 Thesis
GradDate
Defense Date
TextTitleAreaKeywordsAbstractChapters…
1st
Quarter
Person
Faculty Student
Lecturer Professor Undergrad Graduate
Master PhD
IBM Research
17 Hybrid Relational/XML DB Design - Mirella M Moro
Person
Faculty Student
PERSON (id, SSN, DOB, marStatus, firstN, mi, lastN, phone FK)FACULTY (id, personID FK, resume FK)
STUDENT (id, personID FK, gradDate, firstQuarter)
Hierarchies : Relational SchemasHierarchies : Relational Schemas
(A)
FACULTY (id, SSN, DOB, marStatus, firstN, mi, lastN, phone FK, resume FK)
STUDENT (id, SSN, DOB, marStatus, firstN, mi, lastN, phone FK, gradDate, firstQuarter)
(total/disjoint inheritance: each person must be either faculty or student)(B)
PERSON (id, SSN, DOB, marStatus, firstN, mi, lastN, phone FK, type,resume FK,gradDate, firstQuarter)
(disjoint inheritance: each person is either faculty or student + not many specialized attributes)(C)
PERSON (id, SSN, DOB, marStatus, firstN, mi, lastN, phone FK,
Fflag, resume FK,Sflag, gradDate, firstQuarter)
(overlapping inheritance: each person may be faculty, student, or both)(D)
IBM Research
18 Hybrid Relational/XML DB Design - Mirella M Moro
Hierarchies: Example 1Hierarchies: Example 1
R1Person Dept
Student
Graduate
PhD
SSN
maritalstatus
ID Name
first mi last
phoneDOB
R3 Thesis
GradDateDefense
Date
TextTitleAreaKeywordsAbstractChapters…
1st
Quarter
<Person id=“3d01” dept=“dept001” ><SSN>…</SSN> <DOB>…</DOB><maritalSt>…</maritalSt><name>
<first>…</first><last>…</last>
</name><phones>
<phone>…</phone><phone>…</phone><phone>…</phone>
</phones><Student>
<firstQuarter>…</firstQuarter><gradDate>…</gradDate><Graduate>
<Thesis><DefenseDate>… </DefenseDate><Text>
<Title>…</Title> …</Text>
</Thesis><PHD>
…</PHD>
</Graduate></Student>
</Person>
IBM Research
19 Hybrid Relational/XML DB Design - Mirella M Moro
Hierarchies : Example 2Hierarchies : Example 2
R1Person Dept
Student
Graduate
PhD
SSN
maritalstatus
ID Name
first mi last
phoneDOB
R3 Thesis
GradDateDefense
Date
TextTitleAreaKeywordsAbstractChapters…
1st
Quarter
PERSON (id, info XML)
THESIS (tid, …, personID FK)
PERSON (id, info XML, dept FK)DEPT (did, …)
PERSON (id, info XML)
THESIS (tid, …, personID FK, path)
Superclass relationship
<Person id=“3d01” dept=“dept001” >
Subclass relationship
<Person >…<Graduate>
<Thesis> …
IBM Research
20 Hybrid Relational/XML DB Design - Mirella M Moro
Artificial Artificial UsecaseUsecase
� Artificial academic LDM
� Set R-X partitioning threshold to 70%
� DDLs look very reasonable
IBM Research
21 Hybrid Relational/XML DB Design - Mirella M Moro
HL7 Reference Information ModelHL7 Reference Information Model
� Health Level 7 (HL7) is an XML messaging format for healthcare industry.
� HL7 provides a conceptual model called the Reference Information Model (RIM)
� ReXSA suggests one table with an XML column
� Reasonable because all entities in RIM inherits from a single super-entity.
IBM Research
22 Hybrid Relational/XML DB Design - Mirella M Moro
ConclusionConclusion
� DBMS users don’t really know what to put in XML columns and what to put in relational columns.
� Designing hybrid relational-XML schemas is a problem that has not been addressed before.
� We presented a schema design advisor that takes an annotated logical data model as input and outputs candidate schema design(s)
� Future work�Design evaluation. Is the resultant design a good design?� Incorporate data samples in the analysis� Integration with performance (index, MQT) advisors
IBM Research
23 Hybrid Relational/XML DB Design - Mirella M Moro
QUESTIONS?QUESTIONS?
�Contacts �[email protected]�[email protected]