48
XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Embed Size (px)

Citation preview

Page 1: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

XSDL & Relax : 2 new schema languages for XML

Rajasekar Krishnamurthy

Page 2: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Outline

• DTDs and their drawbacks

• XML Schema Requirements

• XSDL

• RELAX

• Other Schema specifications

Page 3: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Sample XML document

<?xml version="1.0"?><book >

<title>Intro to XML</title> <price>72.50</price> <author>

<name> Albert Einstein </name> <email>[email protected]</email> <phone>608-236-4112</phone>

</author></book>

Page 4: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Equivalent DTD

(!element book (title,price,author*))(!element title #PCDATA)(!element price #PCDATA)(!element author (name,email,phone))(!element name #PCDATA)(!element email #PCDATA)(!element phone #PCDATA)

Page 5: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Drawbacks of DTD<book >

<title>Intro to XML</title> <price>72.50</price> <author>

<title> Dr. </title> <firstname> Albert </firstname>

< lastname> Einstein </lastname><email>[email protected]</email>

<phone>608-236-4112</phone></author>

</book>

Page 6: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Outline

• DTDs and their drawbacks

• XML Schema Requirements

• XSDL

• RELAX

• Other Schema specifications

Page 7: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

What is a schema ?

Model for describing a class of documentsCommon vocabulary for applications

exchanging documentsFormally express syntactic, structural and

value constraints applicable to instance documents

Page 8: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

XML Schema requirements

Mechanisms for constraining document structure

inheritanceembedded documentationapplication specific constraintsprimitive data typingallow creation of user-defined datatypesaddressing the evolution of schema

Page 9: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Application Scenarios

Electronic Commerce transaction processingTraditional document authoring/editingQuery formulation and optimizationOpen and uniform transfer of data between

applications, including databasesMetadata interchange

Page 10: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Outline

• DTDs and their drawbacks

• XML Schema Requirements

• XSDL

• RELAX

• Other Schema specifications

Page 11: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

XML Schema Definition Language

• Enhanced datatypes

• written in XML

• separates element tags from types– local namespaces

• Inheritance : derive new type definitions

• Identity constraints

• support for namespaces

Page 12: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Sample XML schema<schema> <element name=“book” type=“booktype”/> <complextype name=“booktype”>

<sequence><element name=“title” type = “string”/><element name=“price” type = “float” /><element name=“author” type=“authortype” minOccurs=“0” maxOccurs=“unbounded”/>

</sequence> </complextype></schema>

Page 13: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Sample schema (contd.)<schema>

<complextype name=“authortype”>

<sequence>

<element name=“name” type=“name”/>

<element name=“email” type=“email”/>

<element name=“phone” type=“phonenumber”/>

<element name=“address” type=“address” minOccurs=“0”/>

</sequence>

</complextype>

</schema>

Page 14: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Schema in graphical form

book

title price author*

name email phone address?

Page 15: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Schema Components

• Building blocks that comprise the abstract data model of the schema

• Primary Components– simple type definitions– complex type definitions– attribute declarations– element declarations

Page 16: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Schema Components

• Secondary components– attribute group

definitions

– identity constraint definitions

– model group definitions

– notation declarations

• Helper components– annotations

– model groups

– particles

– wildcards

Page 17: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Type Definitions

• Separates tag name from type of elements

• types can be– simpletypes

• represent leaf nodes in the graph

• replace PCDATA in DTDs

– complextypes • can have elements and attributes in its content

Page 18: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Sample complexType declaration

<complexType name=“address" > <sequence> <element name="name” type="string”

minOccurs=“0”/> <element name="street" type="string"/> <element name="city" type="string" />

</sequence> <attribute name="country” type = “string”

use=“default” value=“US”/></complexType>

Page 19: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Simpletype : Pattern

<simpletype name=“phonenumber”>

<restriction base=“string”>

<pattern value=“\d{3}-\d{3}-\d{4}”\>

</restriction>

</simpletype>

• Other facets: Enumerate, Range• Other simpletypes: Lists, Union

Page 20: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Elements• Global elements

– can occur as the root of the document– can be included/imported/referenced

• Local elements– can occur only in the specific context– sibling elements need to have same content

model• (!element book (author*, title, author*))

Page 21: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Sample schema<schema>

<element name=“book” type=“booktype”/>

<complextype name=“booktype”>

<sequence>

<element name=“title” type = “string”/>

<element name=“price” type = “float” />

<element name=“author” type=“authortype” maxOccurs=“unbounded”/>

</sequence>

</complextype>

</schema>

Page 22: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Element Content• Complextypes from simple types• <price currency=“USDollar”>23</>

• Mixed content• <price>amount in US-dollars is

<amount>23</amount> only• </price>

• Empty content• <price currency=“USDollars” amount=“23”/>

Page 23: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Building content models(!element author ((name | (title,firstname,lastname)),email,phone))

<author>

< lastname> Einstein </lastname>

<title> Dr. </title>

<firstname> Albert </firstname>

<email>[email protected]</email>

<phone>608-236-4112</phone>

</author>

<author>

<name> Albert Einstein </name>

<email>[email protected]</email>

<phone>608-23-4112</phone>

</author>

Page 24: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Building content models<complextype name=“authortype”>

<sequence>

<choice>

<element name=“name” type=“name”/>

<all>

<element name=“title” type=“titletype”/>

<element name=“firstname” type=“string”/>

<element name=“lastname” type=“string”/>

</all>

</choice>

<element name=“email” type=“email”/> ...

</sequence>

</complextype>

Page 25: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Content models• Can represent any content model expressible with

XML 1.0 DTD and more !!• Does not allow non-determinism

– ( (email,name) | (email,expandedname)) is illegal

– should be (email, (name | expandedname))• Does not allow ambiguity

– ( author*, contactauthor*, author* ) not allowed• author* can be derived in multiple ways

Page 26: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Deriving new types

• Two ways of deriving new types from existing types

• By extension– similar to inheritance in programming

languages

• By restriction– declarations more limited than base type

Page 27: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Deriving by Extension

<complexType name="USAddress" > <sequence> <element name="name” type="string”/> <element name="street" type="string"/> <element name="city" type="string" /> <element name="state" type=”USState"/> <element name="zip" type=”positiveInteger"/> <sequence></complexType>

Page 28: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Declare Base Type

<complexType name=“address" > <sequence> <element name="name” type="string” /> <element name="street" type="string"/> <element name="city" type="string" />

<sequence></complexType>

Page 29: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Derive By Extension

<complexType name=“USAddress”> <complexContent>

<extension base=“address”> <sequence> <element name="state" type=”USState"/>

<element name="zip”type=”positiveInteger"/>

<sequence></extension>

</complexContent></complexType>

Page 30: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Using Derived Types

<address type=“USAddress”>

<street>1210, W.Dayton Street</>

<city>Madison</>

<state>WI</>

<zip>53706</>

</>

<address>

<street>1210, W.Dayton Street</>

<city>Madison</>

</>

Page 31: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Deriving By Restriction

<complexType name=“modifiedAddress”> <complexContent> <restriction base=“address”> <sequence>

<element name="name” type="string” minOccurs=“0” maxOccurs=“0”/>

<element name="street" type="string"/> <element name="city" type="string" />

<sequence> </restriction> </complexContent><complexType>

Page 32: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Identity Constraints

• Can specify integrity constraints– uniqueness, key, keyref

• constraints can be locally scoped

• can be applied on attributes, elements or their contents– XML ID is an attribute

• can create keys/keyrefs from a combination of element and attribute content

Page 33: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Sample constraint

<element name=“book” type=“booktype”>

<unique name=“uniqueauthor”>

<selector xpath=“author”/>

<field xpath=“title”/>

<field xpath=“firstname”/>

<field xpath=“lastname”/>

</unique>

</element>

Page 34: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Other features

• Importing schema components– Type libraries

• Redefining Types & Groups

• Namespaces– Targetnamespaces

• allow undeclared value : support for namespace unaware documents

Page 35: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Other features

• Any element– allows well-formed XML to appear– can be restricted to a set of namespaces

• Any attribute

• anyType– base type for all complexTypes– does not constrain content in any way– default type when none is specified

Page 36: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Main drawback of XSDLAn element declaration (call it D) together with a blocking constraint (a

subset of {substitution, extension,restriction}, the value of a {disallowed substitutions}) is validly substitutable for another element declaration (call it C) if

1.1 the blocking constraint does not contain substitution;

1.2 There is a chain of {substitution group affiliation}s from D to C, that is, either D's {substitution group affiliation} is C, or D's {substitution group affiliation}'s {substitution group affiliation} is C, or . . .;

1.3 The set of all {derivation method}s involved in the derivation of D's {type definition} from C's {type definition} does not intersect with the union of the blocking constraint, C's {prohibited substitutions} and the {prohibited substitutions} of any intermediate {type definition}s in the derivation of D's {type definition} from C's {type definition}.

Page 37: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Main drawback of XSDL

• for a sequence, maximum is

unbounded if the {max occurs} of any wildcard or element declaration particle in the group's {particles} or the maximum part of the effective total range of any of the group particles in the group's {particles} is unbounded, or if any of those is non-zero and the {max occurs} of the particle itself is unbounded, otherwise the product of the particle's {max occurs} and the sum of the {max occurs} of every wildcard or element declaration particle in the group's {particles} and the maximum part of the effective total range of each of the group particles in the group's {particles} (or 0 if there are no {particles})

Page 38: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Outline

• DTDs and their drawbacks

• XML Schema Requirements

• XSDL

• RELAX

• Other Schema specifications

Page 39: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

RELAX

• Developed by Makoto Murata & others in Japan

• based on the hedge automaton theory

• borrows rich datatypes from XML Schema Part2

• Submitted to ISO fast-track

• ease of translation from/to DTDs

Page 40: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Main features of RELAX

• Separates element tagname and type– context sensitive content models

• allows content models similar to XML schema

• allows definition of element and attribute groups

• annotations

• include mechanism for large schemas

Page 41: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Features absent in RELAX

• Support for namespaces – coming shortly??

• Identity constraints

• Inheritance

• New datatypes

Page 42: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

XSDL vs. RELAX

• Allows sibling elements to have different types– allow the content model (author, title, author)

where the two author elements can have different content models

– introduces ambiguity• For content model (title, author*, author*)

• <title>”XYZ”</title><author/> is ambiguous

Page 43: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

XSDL vs. RELAX• A single type can have multiple definitions

– actual definition which matches instance element found by exhaustive search

– atleast one match needs to be found

• nametype can be defined as name or expandedname– it is a choice of the two definitions

Page 44: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Extending existing types

• XSDL uses inheritance – can change (title, author*) to (title, author*,

contactauthor)

• In RELAX, add the new type definition completely– can change (title, author*) to (title,

contactauthor, author*) also

Page 45: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Using attribute values

• <price type=“int”>10</>

• <price type=“string”>ten</>

• content model of price element switched based on attribute value of type attribute

Page 46: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

XSDL vs. RELAX

• RELAX– membership checking in linear time in SAX

model

• XSDL– type assignment in linear time in SAX/DOM

models• ignoring integrity constraints

Page 47: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

Other Schema proposals

• XDR (XML-Data Reduced)– Microsoft’s Biztalk framework

• SOX (Schema for Object-oriented XML)– Commerce One

• DSD– AT&T and BRICS

• Schematron

Page 48: XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy

References

• www.oasis-open.org/cover/schemas.html • www.w3.org/xml/schema.html• www.xml.gr.jp/relax/• Comparative Analysis of SIX XML Schema

Languages, Sigmod Record, Sept. 2000• Reasoning about XML Schema Languages using

Formal Language Theory, WWW submission