26
IBM Software Group ® Native XML Support in DB2 Universal Database Matthias Nicola, Bert van der Linden IBM Silicon Valley Lab [email protected] Sept 1, 2005

Sept 1, 2005

  • Upload
    nodin

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Native XML Support in DB2 Universal Database Matthias Nicola, Bert van der Linden IBM Silicon Valley Lab [email protected]. Sept 1, 2005. Agenda. Why “native XML” and what does it mean? Native XML in the forthcoming version of DB2 Native XML Storage XML Schema Support XML Indexes - PowerPoint PPT Presentation

Citation preview

Page 1: Sept 1, 2005

IBM Software Group

®

Native XML Support in DB2 Universal Database

Matthias Nicola, Bert van der LindenIBM Silicon Valley [email protected]

Sept 1, 2005

Page 2: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

2

Agenda

Why “native XML” and what does it mean?

Native XML in the forthcoming version of DB2Native XML StorageXML Schema SupportXML IndexesXQuery, and the Integration with SQL

Summary

Page 3: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

3

XML Databases

XML-enabled DatabasesThe core data model is not XML (but e.g. relational)Mapping between XML data model and DB’s data

model is required, or XML is stored as textE.g.: DB2 XML Extender (V7, V8)

Native XML DatabasesUse the hierarchical XML data model to store and

process XML internallyNo mapping, no storage as textStorage format = processing formatE.g.: Forthcoming version of DB2

Page 4: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

4

Problems of XML-enabled Databases

CLOB storage:Query evaluation & sub-document level access requires

costly XML Parsing – too slow !

Shredding:Mapping from XML to relational often too complexOften requires dozens or hundreds of tablesComplex multi-way joins to reconstruct documentsXML schema changes break the mapping

• no schema flexibility !

• For example: Change element from single- to multi-occurrence requires normalization of relational schema & data

Page 5: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

5

Integration of XML & Relational Capabilities in DB2

SERVER

CLIENT SQL/X

XQueryXMLInterface

RelationalInterface Relational

XML

DB2 Storage:

DB2 Client /Customer Client Application

Native XML data type • (not Varchar, not CLOB, not object-relational)

XML Capabilities in all DB2 componentsApplications combine XML & relational data

HybridDB2 Engine

Page 6: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

6

Native XML Storage DB2 stores XML in parsed hierarchical format

(similar to the DOM representation of the XML infoset)

create table dept (deptID char(8),…, deptdoc xml);

Relational columnsare stored in relationalformat (tables)

XML is stored nativelyas type-annotated trees(representing theXQuery Data Model).

deptID … deptdoc

“PR27” … <dept> … <emp>…</emp></dept>

… … …

DB2 Storage

Page 7: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

7

dept

name

employee

phoneid=901

John Doe

office

408-555-1212 344

name

employee

phoneid=902

Peter Pan

office

408-555-9918 216

0

1

4

25=901

John Doe

3

408-555-1212 344

1

4

24=902

Peter Pan

3

408-555-9918 216

<dept> <employee id=901> <name>John Doe</name>

<phone>408 555 1212</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

Efficient Document Tree Storage

0 dept4 employee1 name5 id2 phone3 office

String tableSYSIBM.SYSXMLSTRINGS

1 String table per database Database wide dictionary for all

tags in all XML columns

Reduces storage Fast comparisons &

navigation

Page 8: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

8

Information for Every Node

Tag name, encoded as unique StringID

A nodeID

Node kind (e.g. element, attribute, etc.)

Namespace / Namespace prefix

Type annotation

Pointer to parent

Array of child pointers Hints to the kind & name of child nodes

(for early-out navigation)

For text/attribute nodes: the data itself

Page 9: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

9

XML Node Storage Layout

Node hierarchy of an XML document stored on DB2 pages Documents that don’t fit on 1 page: split into regions/pages Docs < 1 page: 1 region, multiple docs/regions per page

Example:

Document split into 3 regions, stored on 3 pages

Split can be at anylevel of the document

Page 10: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

10

XML Storage: “Regions Index”

maps nodeIDs to regions & pages

allows to fetch required regions instead of full documents

allows intelligent prefetching

page page page

Regions index

not user defined, default component of XML storage layer

Page 11: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

11

XML Schema Support & Validation in DB2

Use of XML Schemas is optional, on a per-document basis

No need for a fixed schema per XML column

Validation per document (i.e. per row)

Zero, one, or many schemas per XML column For example: different versions of the same schema,

or schemas with conflicting definitions

Mix validated & non-validated documents in 1 XML column

Schemas are registered & stored in the DB2 Schema Repository (XSR) for fast and stable access.

Page 12: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

12

Validation using Schemas

Validate XML from a parameter marker using xsi:schemaLocation:

insert into dept(deptdoc) values (xmlvalidate(?))

Identify schema for a given document: select deptid, xmlxsrobjectid(deptdoc) from dept where deptid = “PR27”

Override schema location by referencing a schema ID or URI:

insert into dept(deptdoc) values ( xmlvalidate(? according to xmlschema id departments.deptschema)

insert into dept(deptdoc) values ( xmlvalidate(? according to xmlschema uri 'http://my.dept.com’)

Page 13: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

13

XML Indexes for High Query Performance

Define 0, 1 or multiple XML Value Indexes per XML column

XML index maps: (pathID, value) (nodeID, rowID)

Index any elements or attributes, incl. mixed content

Index definition uses an XML pattern to specify which elements/attributes to index (and which not to)

Can index all elements/attributes, but not forced to do so

Can index repeating elements 0 , 1 or multiple index entries per document

New XML-specific join and query evaluation methods, evaluate multiple predicates concurrently with minimal index I/O

xmlpattern = XPath without predicates, only child axis (/) and descendent-or-self axis (//)

Page 14: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

14

XML Indexing: Examplescreate table dept(deptID char(8) primary key, deptdoc xml);

create index idx1 on dept(deptdoc) generate keyusing xmlpattern '/dept/@bldg' as sql double;

create unique index idx2 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/@id' as sql double;

create index idx3 on dept(deptdoc) generate keyusing xmlpattern '/dept/employee/name' as sql varchar(35);

<dept bldg=101> <employee id=901> <name>John Doe</name>

<phone>408 555 1212</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

…xmlpattern '//name' as sql varchar(35); (Index on ALL “name” elements)…xmlpattern '//@*' as sql double; (Index on ALL numeric attributes)…xmlpattern '//text()‘ as sql varchar(hashed); (Index on ALL text nodes, hash code)…xmlpattern ‘/dept//name' as sql varchar(35);

…xmlpattern '/dept/employee//text()' as sql varchar(128); (All text nodes under employee)

…xmlpattern 'declare namespace m="http://www.myself.com/"; /m:dept/m:employee/m:name‘ as

sql varchar(45);

Page 15: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

15

Querying XML Data in DB2

The following options are supported:

XQuery/XPath as a stand-alone language

SQL embedded in XQuery

XQuery/XPath embedded in SQL/XML

Plain SQL for full-document retrieval

Compiler/Optimizer Details: Beyer et al. “System RX”, SIGMOD 2005 [2]

Page 16: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

16

Example: XQuery as a stand-alone Language in DB2

create table dept(deptID char(8) primary key, deptdoc xml);

for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/deptlet $emp := $d//employee/namewhere $d/@bldg = > 95 order by $d/@bldgreturn <EmpList> {$d/@bldg, $emp} </EmpList>

db2-fn:xmlcolumn returns the sequence ofall documents in the specified XML column

<dept bldg=101> <employee id=901> <name>John Doe</name>

<phone>408 555 1212</phone><office>344</office>

</employee><employee id=902>

<name>Peter Pan</name><phone>408 555 9918</phone><office>216</office>

</employee></dept>

Page 17: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

17

Examples: SQL embedded in XQuery create table dept(deptID char(8) primary key, deptdoc xml);

Identify XML data by a SELECT statementLeverage predicates/indexes on relational columns

for $d in db2-fn:sqlquery('select deptdoc from dept (single document) where deptID = “PR27” ')…

for $d in db2-fn:sqlquery('select deptdoc from dept (some documents) where deptID LIKE “PR%” ')…

for $d in db2-fn:sqlquery('select dept.deptdoc from dept, unit (some documents) where dept.deptID=unit.ID and unit.headcount > 200’)…..

for $d in db2-fn:xmlcolumn(‘dept.deptdoc’)/dept , $e in db2-fn:sqlquery('select xmlforest(name, desc)

from unit u’)…

(constructed documents)(join & combine XML and relational data)

Page 18: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

18

Example: XQuery embedded in SQL/XML

SQL/XML Standard Functions: xmlexists, xmlquery, xmltable

create table dept(deptID char(8) primary key, deptdoc xml);

select deptID, xmlquery(‘for $i in $d/dept

let $j := $i//name return $j’ passing deptdoc as “d”) from deptwhere deptID LIKE “PR%”

and xmlexists(‘$d/dept[@bdlg = 101]’ passing deptdoc as “d“)

Page 19: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

19

Other Features in DB2 native XML

XML Text Search SupportXML Import/ExportXML Type in Stored ProceduresAPI Extensions (JDBC, CLI, .NET, etc.)XML Schema RepositoryFull SQL/XML supportVisual XQuery BuilderAnnotated schema shredding…and more

Page 20: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

20

Summary

CLOB and shredded XML storage restrict performance and flexibility

New native XML support in DB2:

Better Performance through Hierarchical & parsed XML representation at all layers Path-specific XML Indexing New XML join and query methods

Higher Flexibility through: Integration of SQL and XQuery Schemas are optional, per document, not per column Zero, one, or many XML schemas per XML column

Page 21: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

21

Questions?

db2 =>

[email protected]

Page 22: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

22

BACKUP CHARTS

Page 23: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

23

Shredding: A simple case

<DEPARTMENT deptid="15" deptname="Sales"> <EMPLOYEE> <EMPNO>10</EMPNO> <FIRSTNAME>CHRISTINE</FIRSTNAME> <LASTNAME>SMITH</LASTNAME> <PHONE>408-463-4963</PHONE> <SALARY>52750.00</SALARY> </EMPLOYEE> <EMPLOYEE> <EMPNO>27</EMPNO> <FIRSTNAME>MICHAEL</FIRSTNAME> <LASTNAME>THOMPSON</LASTNAME> <PHONE>406-463-1234</PHONE> <SALARY>41250.00</SALARY> </EMPLOYEE></DEPARTMENT> Department

DEPTID DEPTNAME15 Sales

EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY

15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 408-463-4963 52750

Page 24: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

24

Shredding: A schema change…“Employees are now allowed to have multiple phone numbers…”

<DEPARTMENT deptid="15" deptname="Sales"> <EMPLOYEE> <EMPNO>10</EMPNO> <FIRSTNAME>CHRISTINE</FIRSTNAME> <LASTNAME>SMITH</LASTNAME> <PHONE>408-463-4963</PHONE> <PHONE>415-010-1234</PHONE> <SALARY>52750.00</SALARY> </EMPLOYEE> <EMPLOYEE> <EMPNO>27</EMPNO> <FIRSTNAME>MICHAEL</FIRSTNAME> <LASTNAME>THOMPSON</LASTNAME> <PHONE>406-463-1234</PHONE> <SALARY>41250.00</SALARY> </EMPLOYEE></DEPARTMENT>

PhoneEMPNO PHONE

27 406-463-123410 415-010-1234

10 408-463-4963

Requires:• Normalization of existing data !• Modification of the mapping• Change of applications

Costly!

DepartmentDEPTID DEPTNAME

15 Sales

EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY

15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 408-463-4963 52750

Page 25: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

25

Native XML in DB2: Key Themes

Standards compliant + driving the standards XML, XQuery, SQL/XML, XML Schema…

Flexibility, because that is what XML is all about.. Zero, one, or many XML schemas per XML column

Native (hierarchical) Storage & Sophisticated XML Indexes New “pivot join” to evaluate many predicates concurrently

Integrated in DB2 Leveraging scalability, reliability, performance, availability…

Integrated with application APIs: JDBC, ODBC, .NET, embedded SQL, CLI,…

Integrated with SQL Access relational data and XML data in same statement

Page 26: Sept 1, 2005

IBM Software Group | DB2 Information Management Software

26

XML Index DDL

create index idx1 on T(xmlcol) generate key using xmlpattern '/a/b/@c' as sql date;

Declaration & use of namespace prefix supported (not shown above)

AS SQL VARCHAR (integer)

CREATE index-name

ON table-name (xml-column-name) GENERATE KEY USING xmlpattern

UNIQUE

text()

@attribute-tag

@*

///

element-tag

*///

INDEX

DOUBLE

DATETIMESTAMP

VARCHAR (HASHED)

xmlpattern:

xmlpattern = XPath without predicates, only child axis (/) and descendent-or-self axis (//)