75
1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see http://www.openlineconsult.com/db

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

Embed Size (px)

Citation preview

Page 1: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

1

Advanced Database Topics

Copyright © Ellis Cohen 2002-2005

Introduction to XML

These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.

For more information on how you may use them, please see http://www.openlineconsult.com/db

Page 2: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 2

Overview of XML / Database Topics

Modeling XML Data (DTD's & XML Schema)

Querying XML Data (XPath & XQuery)XML Persistent StorageClient Access to Local & Persistent

XML Data Modifying Persistent XML DataUsing XML Models with RDBs and OODBsIntegrated Access to Relational, OO and

XML Data SourcesIntegrating XML into RDBs

Page 3: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 3

Topics for This Lecture

Introduction to XMLDTD's:

Document Type DefinitionsXML Schema

Page 4: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 4

Introductionto XML

Page 5: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 5

XML Data Representation

RDBs are about representing and storing data as relations

OODBs are about representing and storing data as networks of objects

XML is about hierarchical data representations – data represented as trees

Such data can not only be stored in an XML DB, but also has a standard textual format -- XML

Page 6: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 6

XML as a Textual Representation

eXtensible Markup LanguageMechanism for tagging text to describe the

meaning of the informationHuman Readable (not necessarily easily…)

Looks similar to HTML tagsThis is <b>very</b> important

But• Tags define the meaning of the information,

not its presentation• Arbitrary rather than fixed set of tags• Documents may optionally be typed which

defines the set of tags allowed in a document and how and where they can be used

• Everything in XML is case-sensitive (content, tags, and attributes)

tag

Page 7: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 7

HTML Example

<h1>Books for CS779</h1><p><b>Database Design, Implementation &

Management, 5th Edition</b><br>Rob & Coronel<br><i>Course Technology</i><p><b>Professional XML Databases</b><br>Williams<br><i>Wrox Press</i>

tag

Page 8: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 8

XML Example<?xml version="1.0"><!DOCTYPE CourseBooks SYSTEM "http://…/cbooks.dtd">

<CourseBooks><Course>CS779</Course><Book>

<Title>Database Design, Implementation & Management, 5th Edition</Title>

<Author>Rob & Coronel</Author><Publisher>Course

Technology</Publisher></Book><Book>

<Title>Professional XML Databases</Title><Author>Williams</Author><Publisher>Wrox Press</Publisher>

</Book></CourseBooks><!-- That's all folks -->

Prolog

Body

tag

Page 9: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 9

XML Example Tree

root

CourseBooks

Course Book Book

Title Author Publisher…"CS779"

"…" "…""Rob & Coronel"

ElementNode

TextNode

Page 10: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 10

Tags Provide Meaning

Tags– describe the meaning of the content– make the document self-defining

Compare to untagged, delimited text

Database Design, Implementation & Management, 5th Edition|Rob & Coronel|Course Technology

Professional XML Databases|Williams|Wrox Press

Page 11: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 11

Unusual Character Data

What to do if your character data has special characters in it -- e.g. angle brackets?

CDATA<Title> <![CDATA[Why X < Y]]></Title>

Entity References<Title> Why X &lt; Y</Title>

Page 12: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 12

Attributes

<Book><Title>Professional XML

Databases</Title><Author>Williams</Author><Publisher>Wrox Press</Publisher>

</Book>

vs

<Book title="Professional XML Databases" author="Williams"publisher="Wrox Press"/>

Page 13: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 13

Maximal Use of Attributes<?xml version="1.0"><!DOCTYPE CourseBooks SYSTEM "http://…/cbooks.dtd">

<CourseBooks course="CS779"><Book title="Database Design, Implementation &

Management, 5th Edition" author="Rob & Coronel" publisher="Course Technology"/>

<Book title="Professional XML Databases" author="Williams" publisher="Wrox Press"/>

</CourseBooks>

An attribute can generally only be used to represent a single value.

If you want to represent structured data or of a list of values or data items, you should

use an element

Prolog

Body

Page 14: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 14

XML with Attributes

root

CourseBooks

course Book Book

title author publisher…

ElementNode

AttributeNode

Page 15: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 15

XML can be UnNormalized

root

CourseBooks

Book

title Author publisher

……

name address dob

Suppose someone authored multiple books

Page 16: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 16

XML can be Normalized

BookDB

Booklist

Book

title

Author

publisher

……

name address dob AuthorRef

Authlist

……

root

authid

Note this is not an attribute!

Page 17: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 17

Normalized XML<?xml version="1.0"><!DOCTYPE BookDB SYSTEM "http://…/bookdb.dtd">

<BookDB><Booklist>

<Book title="Furniture Design, Implementation & Management, 5th Edition" publisher="Course Technology">

<AuthorRef>Wil4421</AuthorRef></Book><Book title="Cool ideas" publisher="Mumbo Jumbo Books"> <AuthorRef>Wil4421</AuthorRef></Book><Book title="Professional Peach Databases" publisher="Wrox Press"> <AuthorRef>Wil4421</AuthorRef> <AuthorRef>Bor601</AuthorRef></Book>…

</Booklist><Authlist>

<Author authid="Wil4421" name="Juan Williams" address="…" dob="11-04-42">

<Author authid="Bor601" name="Gonzo Borscht" address="…" dob="3-17-88">

…</Authlist>

</BookDB>No redundancy, often similar to way

information would be stored in an RDB

A book may have multiple

authors

Page 18: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 18

Storing Information in an RDB

BookListtitle authrefs publisher

Furniture Design, Implementation & Management, 5th Edition

Wil4421Course

Technology

Cool Ideas Wil4421Mumbo Jumbo

Books

Professional Peach DatabasesWil4421 Bor601

Wrox Press

… … …

AuthListauthid name address dob

Wil4421 Juan Williams … 11-04-42

Bor601 Gonzo Borscht … 3-17-88

… …

Not 1NF

Page 19: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 19

Structure of XML

Tags: Book, Title, Author, Publisher

<Author>Rob & Coronel</Author>

<Book title="Let's have fun"/>

start tag content end tag

element

empty element(start & end tag are combined; no content, only attributes)

Empty element indicator

Page 20: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 20

XML for Structured Text

XML represents structured text if the content of an element is either– Character data (i.e. untagged text)

<Author>Williams</Author>

– A sequence of one or more elements<Book>

<Title>Database Design, Implementation & Management, 5th Edition</Title>

<Author>Rob & Coronel</Author>

<Publisher>Course Technology</Publisher>

</Book>

content

Content

Page 21: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 21

XML for Semi-Structured Text

In semi-structured text, the content of an element may contain a sequence of untagged text and elements, which are intermixed.

<Description><Author>Williams</Author> is the author of <Title>Professional XML Databases</Title>, which is published by <Publisher>Wrox Press</Publisher>

</Description>

More like HTML markup, but with tags used to identify useful information.

Page 22: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 22

Description Example Tree

Description

AuthorTitle Publisher

"Williams"

" is the author of "

"Wrox Press"

"Professional XML

Databases"TextNode

ElementNode

" which is published

by "

Page 23: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 23

VocabulariesXML Tags to use in specific domains

eg. Business, Legal, Music, Robotics, Math, Chemical, Genetic

Some Business Vocabularies:Commerce XML (cXML) developed by Ariba

and MicrosoftBizTalk RosettaNetOFE: Open Financial Exchange

Go to http://www.xml.org/xml/registry.jsp for a detailed list of XML specs

Page 24: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 24

Namespaces

Allows libraries of tags to be combined without collisions

<CourseBooks xmlns:isbn="http://www.isbn.org"course="CS779"><Book>

<Title>Professional XML Databases</Title><Author>Williams</Author><Publisher>Wrox Press</Publisher>

<isbn:Number>1861003587</isbn:Number></Book>

</CourseBooks>

Local namespace prefixNamespace

identification

Use of namespace prefix to (potentially)

disambiguate tag

Page 25: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 25

Uses of XML

• Document Storage and RetrievalXML databases, with mechanisms for storage,

retrieval (querying) and updates, both of structured & semi-structured text

• Communication (e.g. SOAP)MessagesRemote procedure calls

• Specifying Configurations & MetadataResource files, initialization files, description

files, etc. (e.g. WSDL)

• Procedural and Descriptive LanguagesJSP, XML Schema, XQueryX, XSL

When it is advantageous to persistently store information in an XML tree representation rather than an RDB?

Page 26: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 26

Advantages of XML over RDB

• When you want to store and query significant amounts of semi-structured text

• When there are many tags and attributes which are optional and are used relatively infrequently

Page 27: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 27

JSP

<c:forEach var="customer" items="${customers}"><c:if test="${book.price <= customer.limit}">

<c:out value="${book.title}"/>fits <c:out value="${customer.name}"/>'s budget!<br>

</c:if></c:forEach>

Page 28: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 28

Some XML-based Languages

XML SchemaA language for defining the type of other XML

documents (including specifying the allowable tags and how they can be used)

XQueryXAn XML representation of the XQuery language for

locating and retrieving parts of XML documents

XSLA language for specifying the style of another XML

document.Defines how the XML document should be converted

to HTML or some other presentation format.More generally used for transforming an XML

document to a different format (which may or may not be XML-based)

Page 29: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 29

XML Type Definitions

We want to define the content of elements and attributes in XML just as we define the types of fields in a relational DB

There are 2 ways to define the type of XML documents

• DTD (Document Type Definition)Not XML-BasedUsed originally for defining SGML

(which predates the web)(XML is a simplification of SGML)

• XML SchemaSchema definition is XML-BasedFocus on reusabilityFiner grain constraints

Page 30: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 30

XML and Database Evolution

Hierarchical DB

Network DB

XML DB

OO/OR DB

Relational DB

An XML DB is used for storage and retrieval of hierarchically organized information

Page 31: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 31

RDB, ORDB, OODB, XMLDB

SQL DDL

SQL DML

SQL DML

Define

Query

Modify

Fill-in the rest …

RDB ORDB OODB XMLDB

Page 32: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 32

RDB, ORDB, OODB, XMLDB

SQL DDL

SQLDDL

ODL DTDXSchema

SQL DML

SQLDML

OQL(or OPathor other variants)

XPathXQueryXSLT

SQL DML

SQLDML

OOPLsDOM API'sXUpdate

SQL (if XRDB)

Define

Query

Modify

RDB ORDB OODB XMLDB

Page 33: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 33

DTD'sDocument Type

Definitions

Page 34: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 34

Document Type Definition

Formal grammar to specify structure and permissible valuesValid XML is well-formed syntacticallyValid XML conforms to the rules of the

vocabulary

DTD provides means of validating XML Documents

DTD also provides documentation of the vocabulary

DOCTYPE declaration to specify a DTD<!DOCTYPE Books SYSTEM "http://…/books.dtd">

Page 35: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 35

Example DTD

<!ELEMENT Title (#PCDATA)><!ELEMENT Author (#PCDATA)><!ELEMENT Publisher (#PCDATA)><!ELEMENT Course (#PCDATA)>

– Title, Author, Publisher & Course elementshave strictly textual content

<!ELEMENT Book (Title, Author, Publisher?)>– The content of a Book consists of a Title element,

followed by an Author element, optionally (?) followed by a Publisher element. Order matters!

<!ELEMENT CourseBooks (Course, Book+)>– The content of a CourseBooks element consists of a

Course element, followed by one or more (+) Book elements

Page 36: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 36

Example CourseBooks XML<?xml version="1.0"><!DOCTYPE CourseBooks SYSTEM "http://…/cbooks.dtd">

<CourseBooks><Course>CS779</Course><Book>

<Title>Database Design, Implementation & Management, 5th Edition</Title>

<Author>Rob & Coronel</Author><Publisher>Course

Technology</Publisher></Book><Book>

<Title>Professional XML Databases</Title><Author>Williams</Author><!– No publisher, it’s optional! -->

</Book></CourseBooks><!-- That's all folks -->

Prolog

Body

Page 37: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 37

Content Model

<!ELEMENT name content>

Content can be

– ANY: Anything<!ELEMENT Description ANY>

– EMPTY: Must be an EMPTY element<!ELEMENT Details EMPTY>

– regexp: a parenthesized regular expression <!ELEMENT CourseBooks (Course, Book+)>regexp

Page 38: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 38

Regular Expressions• Sequence

( exp1, …, expn )where each exp can be an element name, #PCDATA, a sequence, an alternative, or a repetitionSequence of first regexp, then second, etc.e.g. (#PCDATA)e.g. (Course, Book+)e.g. (Title, Author, Publisher)

• Alternative( exp1 | … | expn )where each exp can be an element name, #PCDATA, a sequence, an alternative, or a repetitionAny one of the expressions listede.g. ( Publisher | isbn:Number )

• Repetition– exp+ -- 1 or more– exp* -- 0 or more– exp? -- optional

where each exp can be an element name, a sequence, or an alternative

Character Data

Page 39: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 39

Element-ary Problem

<!ELEMENT Note (#PCDATA)><!ELEMENT Relevance (#PCDATA)><!ELEMENT Info

(#PCDATA | (Note, Relevance?))*

What are some example of XML bodies (no prolog)whose root element is Info

that would be valid for this DTD

Page 40: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 40

Some Element-ary Solutions

<Info></Info>

<Info>Hello</Info>

<Info><Note>ok</Note></Info>

<Info> <Note>this is a note</Note> hello <Note>ok</Note><Relevance>13</Relevance> <Note>another note</Note></Info>

Page 41: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 41

Alternative Nesting Styles

There are three alternative DTDs for BillingData below. Give an example for each one that involves 3 payments & 3 charges.

Under which circumstances would you recommend each of the styles?

1) <!ELEMENT BillingData (Payment | Charge)*>

2) <!ELEMENT BillingData (Payment*, Charge*)>

3) <!ELEMENT BillingData (Payments, Charges)>

<!ELEMENT Payments (Payment*)>

<!ELEMENT Charges (Charge*)>

Page 42: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 42

Interleaved vs Grouped<BillingData> <Charge> … </Charge> <Payment> … </Payment> <Charge> … </Charge> <Charge> … </Charge> <Payment> … </Payment></BillingData>

<BillingData> <Payment> … </Payment> <Payment> … </Payment> <Charge> … </Charge> <Charge> … </Charge> <Charge> … </Charge></BillingData>

1) Intermixed charges & payments

2) All payments followed by all charges

Page 43: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 43

Separated vs SubGrouped<BillingData> <Payment> … </Payment> <Payment> … </Payment> <Charge> … </Charge> <Charge> … </Charge> <Charge> … </Charge></BillingData>

<BillingData> <Patments> <Payment> … </Payment> <Payment> … </Payment> <Payments> <Charges> <Charge> … </Charge> <Charge> … </Charge> <Charge> … </Charge> <Charges></BillingData>

3) All payments & all charges respectively placed within distinct parent elements

2) All payments followed by all charges

Page 44: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 44

Ordering Problem

Suppose a book consists of a title and an author, but they could appear in either order.What's the corresponding DTD definition for Book?

Page 45: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 45

Ordering Solution

<!ELEMENT Book ((Title, Author) | (Author, Title) )>

Suppose a book consists of a single title, a single author, and a single publisher, but they could appear in any order.What's the corresponding DTD definition for Book?

Page 46: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 46

Triple Ordering Solution

<!ELEMENT Book ( (Author, Title, Publisher) | (Author, Publisher, Title) | (Title, Author, Publisher) | (Title, Publisher, Author) | (Publisher, Author, Title) | (Publisher, Author, Title) )>

Suppose a book can contain a title and at least one author in any order. What's the DTD?

How about a book which contains at least one author and any number of co-authors in any order?

Page 47: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 47

Mixed Order with Repetition

<!ELEMENT Book ((Author+, Title, Author*) | (Author*, Title, Author+) )>

<!ELEMENT Book ((Coauthor*, Author, (Coauthor*, Author*)*>

Page 48: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 48

Limitations of Regular Expressions

Impose unwanted constraints on order<!ELEMENT Book (Title, Author, Publisher)>

Can be too vague<!ELEMENT Book (Title | Author | Publisher)*>

Note: could be helped by adding the interleaved combination operator &

(not supported in DTD standard)(Title & Author & Publisher)(Title & Author+)(Author+ & Coauthor*)

Page 49: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 49

Attributes

<!ATTLIST Booktitle CDATA #REQUIREDauthor CDATA #REQUIREDpublisher CDATA #IMPLIEDisbn:number CDATA #IMPLIED>

A Book is required to have a title and an author attribute, and optionally may have a publisher and isbn:number attribute.

The attributes may appear in any order!

Page 50: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 50

Elements & Attributes

ELEMENT descriptions:Define an element's sub-elements

(and the order in which they must appear!)

ATTLIST descriptions:Define an element's attributes

(which can appear in any order)

The designer determines which content should be represented by elements & which by attributes.

Changing this requires changing the DTD

Page 51: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 51

Revised CourseBooks DTD<!ELEMENT Book EMPTY><!ATTLIST Book

title CDATA #REQUIREDauthor CDATA #REQUIREDpublisher CDATA #IMPLIEDisbn:number CDATA #IMPLIED>

– A Book has no sub-elements (EMPTY), but must have a title and author attribute, and optionally has a publisher and isbn number. These can appear in any order.

<!ELEMENT CourseBooks (Book+)><!ATTLIST CourseBooks

course ID #REQUIRED>

– A CourseBooks element has one or more Books, and also has a course attribute, which must be unique

Page 52: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 52

XML Example for Revised DTD<?xml version="1.0"><!DOCTYPE CourseBooks SYSTEM "http://…/cbooks.dtd">

<CourseBooks xmlns:isbn="http://www.isbn.org"course="CS779"><Book title="Database Design, Implementation &

Management, 5th Edition" author="Rob & Coronel"/><Book author="Williams" title="Professional XML

Databases" publisher="Wrox Press"isbn:number="304-22-15678"/>

</CourseBooks>

An attribute can generally only be used to represent a single value.

If you want to represent structured data or of a list of values or data items, you should

use an element

Prolog

Body

Page 53: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 53

Attributes Kinds in DTDs

#REQUIRED#IMPLIED optionalvalue default valuevalue #FIXED the only allowed value

Page 54: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 54

Attributes Types in DTDs

CDATA character dataID key value definitionIDREF reference to an id'd entityIDREFS list of id references

(blank-separated)NMTOKEN must be a valid XML nameNMTOKENS list of valid XML namesENTITY non-text content (e.g. gif)Enumerations

e.g. (Monday | Wednesday | Friday)

Note: attributes could look like XML, but would just be strings with angle brackets, and no substructure

Page 55: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 55

ID and IDREF<!ATTLIST Book

title CDATA #REQUIREDid ID #REQUIREDpubref IDREF #IMPLIEDisbn:numberCDATA #IMPLIEDsources IDREFS #IMPLIED>

<CourseBooks course="CS779"><Book id="ddim5" title="Database …" …/><Book id="pxd" title="Professional XML

Databases" pubref="Wrox" /><Book id="gdb" title="Great Database Book"

sources="ddim5 pxd" … /></CourseBooks>

IDREFS are allowed, but not

widely used

IDREF & IDREFS support referential integrity.Each id reference in an IDREF or IDREFS attribute

must match the value of some ID attribute

Page 56: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 56

Stylistic Consistency

The designer of a DTD has many stylistic choices– The order of elements

– When to use elements and when to use attributes

– Whether lists of ids or names should be represented as a single whitespace-separated attribute or as repeated elements

– Whether repeated elements should be nested inside a collection element

These are aesthetic choices that should be made consistently.

Page 57: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 57

Alternative Representations for Repeated ID's

<Book title="Professional Peach Databases"> <AuthorRef>Wil4421</AuthorRef> <AuthorRef>Bor601</AuthorRef></Book>

<Book title="Professional Peach Databases"> <AuthorRef authref="Wil4421"/> <AuthorRef authref="Bor601"/></Book>

<Book title="Professional Peach Databases" authrefs="Wil4421 Bor601"/>

For each case, what are the corresponding DTD's?

Page 58: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 58

Alternative DTDs<!ELEMENT Book (AuthorRef*)><!ATTLIST Book

title CDATA #REQUIRED><!ELEMENT AuthorRef (#PCDATA)>

<!ELEMENT Book (AuthorRef*)><!ATTLIST Book

title CDATA #REQUIRED><!ELEMENT AuthorRef EMPTY><!ATTLIST AuthorRef

authref IDREF #REQUIRED>

<!ELEMENT Book EMPTY><!ATTLIST Book

title CDATA #REQUIRED authrefs IDREFS #REQUIRED>

Page 59: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 59

Limitations of DTDs

•Limited support for reusability and namespaces

•No interleaved combination operator for regular expressions

•Type specifications (e.g. ID, IDREF) only allowed for attributes, not text contents– Whitespace-separated lists only

supported for attributes, and only for NMTOKENS and IDREFS

•No way to constrain values•Very primitive referential integrity

Page 60: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 60

XML Schema

Page 61: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 61

XML Schema Example<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="Title" type="xs:string"/><xs:element name="Author" type="xs:string"/><xs:element name="Publisher" type="xs:string"/><xs:element name="Course" type="xs:NMTOKEN"/>

<xs:element name="Book"><xs:complexType>

<xs:sequence><xs:element ref="Title"/><xs:element ref="Author"/><xs:element ref="Publisher" minOccurs="0"/>

</xs:sequence></xs:complexType>

</xs:element>

<xs:element name="CourseBooks"><xs:complexType>

<xs:sequence><xs:element ref="Course"/><xs:element ref="Book" maxOccurs="unbounded"/>

</xs:sequence></xs:complexType>

</xs:element>

</xs:schema>

Page 62: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 62

Regular Expressions in XML Schema (with equivalent DTD)

<xs: …X… minOccurs="0" maxOccurs="unbounded"/>X*

<xs: …X… minOccurs="1" maxOccurs="unbounded"/>X+

<xs: …X… minOccurs="0" maxOccurs="1">X?

<xs:sequence> A B C </xs:sequence>(A, B, C)

<xs:choice> A B C </...>(A | B | C)

<xs:all> A B C </...>(A & B & C)

• can only appear as single child of a complexType• children can only be elements with maxOccurs=1

Page 63: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 63

Attributes in XML Schema

<xs:element name="CourseBooks"><xs:complexType>

<xs:element ref="Book" maxOccurs="unbounded"/><xs:attribute name="course" type="xs:NMTOKEN" use="required">

</xs:complexType></xs:element>

One or more attributes can be associated with any complexType

Page 64: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 64

Mixed Content & Any Type

Pro: Allows separation of type constraints from decision to allow mixed text

Con: Not possible to constrain more exactly where mixed text is allowed (though there rarely is a need to constrain it)

Means anything is permitted there

<xs:complexType mixed="true">

<xs:element name="anything" type="xs:anyType"/>

Page 65: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 65

Simple Typesstring, CDATA, token, NMTOKEN, NMTOKENS, ID,

IDREF, IDREFS, ENTITY, ENTITIES, …

booleandecimal, integer

long, int, short, byteunsignedLong, unsignedInt, unsignedShort,

unsignedBytenonPositiveInteger negativeInteger,

nonNegativeInteger, positiveInteger

floatduration, dateTime, time, date, …hexBinary, …anyUri…

Page 66: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 66

Facets of Simple Types

lengthminLength maxLength pattern enumeration whiteSpace

maxInclusivemaxExclusiveminInclusiveminExclusivetotalDigitsfractionDigits

•Facets additional properties restricting a simple type

•15 facets defined by XML Schema

Page 67: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 67

Simple Type Definitions

<xs:simpleType name="integer"> <xs:restriction base="xs:decimal"> <xs:fractionDigits value="0" fixed="true"/> </xs:restriction></xs:simpleType>

<xs:simpleType name="nonPositiveInteger"> <xs:restriction base="xs:integer"> <xs:maxInclusive value="0" /> </xs:restriction></xs:simpleType>

An integer is a decimal with no fractionDigits

A nonPositiveInteger is an integerwhose largest value is 0

Page 68: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 68

Primitive List Types

<xs:simpleType name="NMTOKENS"> <xs:restriction> <xs:simpleType> <xs:list itemType="xs:NMTOKEN"/> </xs:simpleType> <xs:minLength value="1"/> </xs:restriction></xs:simpleType>

NMTOKENS is a whitespace-separate list of NMTOKENs

Page 69: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 69

Patterns, Enumerations & Unions

<xs:simpleType name="isbnType"> <xs:union> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{10}"/> </xs:restriction> </xs:simpleType> <xs:simpleType> <xs:restriction base="xs:NMTOKEN"> <xs:enumeration value="TBD"/> <xs:enumeration value="NA"/> </xs:restriction> </xs:simpleType> </xs:union></xs:simpleType>

An isbnType is either a string consisting of 10 digits or is one of the strings TBD or NA

Page 70: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 70

Extending ComplexTypes<xs:complexType name="BookType"> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="author"

type="xs:string"/> <xs:element name="publisher"

type="xs:string" minOccurs="0"/> </xs:sequence></xs:complexType>

<xs:complexType name="IsbnBookType> <xs:complexContent> <xs:extension base="BookType"> <xs:sequence> <xs:element name="isbn:number"

type="xs:string" minOccurs="0"/> </xs:sequence> </xs:extension > </xs:complexContent></xs:complexType>

IsbnBookType extends BookType with isbn:number

Page 71: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 71

Groups<xs:group name="reviewElements"> <xs:sequence> <xs:element name="review" type="xs:string"/> <xs:element name="reviewDate" type="date"/> </xs:sequence></xs:group>

<xs:complexType name="ReviewedBookType> <xs:complexContent> <xs:extension base="BookType"> <xs:sequence> <xs:group ref="reviewElements"/> </xs:sequence> </xs:extension > </xs:complexContent></xs:complexType>

Groups of attributes can be defined as well

ReviewedBookType extends BookType with review and reviewDate

Page 72: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 72

Inclusion & Namespaces

Use xs:include to include definitions from another schema definition file

Use xs:redefine to include definitions and selectively redefine some of them

Use the targetNamespace attribute within xs:schema to define the namespace of the declarations

Use xs:import to import definitions from the schema definition file associated with another namespace

Page 73: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 73

Uniqueness, Keys & Referencesxs:unique can be used to require that

each specified combination of fields (i.e. named attributes and/or contents of named elements)

within a specified set of elementsmust be unique

xs:key is like xs:unique but also requires that the values are not nil

xs:keyref provides referential integrity by requiring thateach specified combination of fieldswithin a specified set of elements

within a specified set of elements must correspond to some existing combination associated with an xs:key or xs:unique definition

Page 74: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 74

XML Key Reference Example

BookDB

Booklist

Book

title

Author

publisher

……

name address dob Authref

Authlist

……

root

authid

Page 75: 1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Introduction to XML These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

© Ellis Cohen, 2002-2005 75

Key and Keyref Example

<xs:key name="authkeys"><xs:selector xpath="//Author"/><xs:field xpath="@authid"/>

</xs:key>

Every author's authid is unique and non-nil

Each book's Authref refers to a legal authid

<xs:keyref name="authrefs" refer="authkeys"><xs:selector xpath="//Book"/><xs:field xpath="Authref"/>

</xs:keyref>

The contents of a book's Authref element must correspond to some author's authid attribute