42
© 2008 MindTree Consulting XML Schema Neeraj Singh October 2009

Session04 XML Validation Schema

Embed Size (px)

Citation preview

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 1/42

© 2008 MindTree Consulting

XML SchemaNeeraj Singh

October 2009

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 2/42

Slide 2

Agenda

XML Validation

Introduction to XML Schema

Examples / Demo

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 3/42

© 2008 MindTree Consulting

XML Validation

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 4/42

Slide 4

An Introduction to XML Validation

One of the important innovations of XML is the ability to placepreconditions on the data the programs read, and to do this in a

simple declarative way.

XML allows you to say

that every Order element must contain exactly one Customer element,

that each Customer element must have an id attribute that contains an

XML name token,

that every ShipTo element must contain one or more Streets, one City,

one State, and one Zip, and so forth.

Checking an XML document against this list of conditions is called

validation.

Validation is an optional step but an important one.

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 5/42

Slide 5

Validation

There are many reasons and opportunities to validate an XML document:When we receive one, before importing data into a legacy system

When we receive one, before importing data into a legacy system, when we have

produced or hand-edited one

To test the output of an application, etc.

Validation as “firewall”

to serve as actual firewalls when we receive documents from the external world

(as is commonly the case with Web Services and other XML communications),

to provide check points when we design processes as pipelines of transformations.

Validation can take place at several levels.

Structural validation

Data validation

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 6/42

Slide 6

Schema Languages

There is more than one language in which you can express suchvalidation conditions. Generically, these are called schema

languages, and the documents that list the constraints are called

schemas.

Different schema languages have different strengths and

weaknesses.

The document type definition (DTD) is the only schema language

built into most XML parsers and endorsed as a standard part of XML.

The W3C XML Schema Language (schemas for short, though it’s

hardly the only schema language) addresses several limitations of 

DTDs.

Many other schema languages have been invented that can easily

be integrated with your systems.

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 7/42© 2008 MindTree Consulting

XML Schema

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 8/42

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 9/42Slide 9

Schema definition

A schema is defined in a separate file and generally stored with the.xsd extension.

Every schema definition has a schema root element that belongs to

the http://www.w3.org/2001/XMLSchema namespace. The schema

element can also contain optional attributes.

For example:

The following example indicates that the elements used in the schema

come from the http://www.w3.org/2001/XMLSchema namespace.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<!– Other definitions will come here.-->

</xs:schema>

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 10/42Slide 10

Schema Linking when document root element is from null namespace

Let's start with our first document. It must have only "root"element and this element can contain text only. The element is

from null namespace. Valid document –

<root xmlns="">aaa</root> 

If you want to validate this document with XML Schema, you haveto associate some Schema document with it. If the root element is

from null namespace, you will use "noNamespaceSchemaLocation"

attribute.

<root xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns=""xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > test

</root> 

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 11/42Slide 11

Schema Linking when document root element from some particular

namespace

Now, let's have the same document as in previous example, but the"root" element must be from some concrete namespace, let's say

"http://foo". Valid document 

<root xmlns="http://foo" >aaa</root> 

If the root element is from some particular namespace, youassociate the Schema using "schemaLocation" attribute. The first

part of this attribute is the target namespace, the second one the

URL of the Schema file.

<f:root xsi:schemaLocation="http://foo correct_0.xsd"xmlns:f="http://foo" xmlns:xsi="http://www.w3.org/2001/XMLSchema-

instance" > test </f:root> 

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 12/42Slide 12

01_FirstXMLSchema.xsdWriting your first XML Schema and a valid XML file based on this. This

will also demonstrate how to link a XML file with a XML schema.

02_FirstNameSpace.xsd

This example demonstrate the use of namespace. If you have a xmldocument that belongs to certain namespace, how to connect to a XML

Schema.

Example

s / Demo

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 13/42Slide 13

Schema elements

A schema file contains definitions for element and attributes, aswell as data types for elements and attributes. It is also used to

define the structure or the content model of an XML document.

Elements in a schema file can be classified as either simple or complex

Schema elements: Simple type

A simple type element is an element that cannot contain any attributes

or child elements; it can only contain the data type specified in its

declaration. The syntax for defining a simple element is:

<xs:element name="ELEMENT_NAME" type="DATA_TYPE" default/fixed="VALUE" />Where DATA_TYPE is one of the built-in schema data types

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 14/42Slide 14

Schema elements: Simple type Contd…

You can also specify default or fixed values for an element. You dothis with either the default or fixed attribute and specify a value

for the attribute. Note: Specifying a fixed or default attribute is

optional.

An example of a simple type element is:<xs:element name="Author" type="xs:string" default="Whizlabs"/>

All attributes are simple types, so they are defined in the same

way that simple elements are defined. For example:

<xs:attribute name="title" type="xs:string" />

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 15/42Slide 15

Schema data types

All data types in schema

inherit from anyType.

This includes both simple

and complex data types.

You can further classify

simple types into built-

in-primitive types and

built-in-derived types.

Built-in datatype

hierarchy

A complete hierarchical

diagram from the XML

Schema DatatypesRecommendation is

shown below.

ur types – derived by restriction

built-in primitive types – derived by list

built-in primitive types – derived by

extension or restriction

Complex types

All complex types

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 16/42Slide 16

Schema elements: Complex types

Complex types are elements that either:

Contain other elements

Contain attributes

Are empty (empty elements)

Contain text

To define a complex type in a schema, use a complexType element.

You can specify the order of occurrence and the number of times an element can occur (cardinality) by using

the order and occurrence indicators, respectively.

For example:

<xs:element name="Book">

<xs:complexType>

<xs:sequence>

<xs:element name="Name" type="xs:string" />

<xs:element name="Author" type="xs:string" maxOccurs="4"/>

<xs:element name="ID" type="xs:string"/>

<xs:element name="Price" type="xs:string"/></xs:sequence>

</xs:complexType>

</xs:element>

In this example, the order indicator is xs:sequence, and the occurrence indicator is maxOccurs in the Author element name.

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 17/42Slide 17

Schema elements: Complex types (Mixed content)

W3C XML Schema supports mixed content though the mixed attribute in

the xs:complexType elements. Consider 

<xs:element name="book">

<xs:complexType mixed="true">

<xs:all>

<xs:element name="title" type="xs:string"/>

<xs:element name="author" type="xs:string"/>

</xs:all>

<xs:attribute name="isbn" type="xs:string"/>

</xs:complexType>

</xs:element>

It will validate an XMLelement such as:

<book isbn="0836217462">

Funny book by

<author>Charles M. Schulz</author>.

Its title (<title>Being a Dog Is a Full-

Time Job</title>) says it all !

</book>

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 18/42Slide 18

07_ComplexType01.xsd

Your first complex type. Element can contain a mixture of elements.

Now, we want the element "root" to contain elements "aaa", "bbb", and

"ccc" in any order. We will use the "all" element. It also demonstrate the

use of All.

11_EmptyElementUsingAnyType.xsd

Empty element. We want to have the root element to be named "AAA",

from null namespace and empty. The empty element is defined as a

"complexType" with a "complexContent" which is a restriction of 

"anyType", but without any elements.

Example

s / Demo

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 19/42Slide 19

Occurrence indicators

Occurrence indicators specify the number of times an element canoccur in an XML document. You specify them with the minOccurs

and maxOccurs attributes of the element in the element definition.

As the names suggest, minOccurs specifies the minimum number of 

times an element can occur in an XML document while maxOccurs

specifies the maximum number of times the element can occur.

It is possible to specify that an element might occur any number of times

in an XML document. This is determined by setting the maxOccurs value

to unbounded.

The default values for both minOccurs and maxOccurs is 1, which means

that by default an element or attribute can appear exactly one time.

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 20/42

Slide 20

Order indicators

Order indicators define the order or sequence in which elementscan occur in an XML document. Three types of order indicators are:

All: If All is the order indicator, then the defined elements can appear in

any order and must occur only once. Remember that both the maxOccurs

and minOccurs values for All are always 1.

Sequence: If Sequence is the order indicator, then the elements must

appear in the order specified.

Choice: If Choice is the order indicator, then any one of the elements

specified must appear in the XML document.

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 21/42

Slide 21

Example: Occurrence and order indicators

<xs:element name="Book">

<xs:complexType>

<xs:all>

<xs:element name="Name" type="xs:string" />

<xs:element name="ID" type="xs:string"/>

<xs:element name="Authors" type="authorType"/>

<xs:element name="Price" type="priceType"/>

</xs:all>

</xs:complexType>

</xs:element>

<xs:complexType name="authorType">

<xs:sequence>

<xs:element name="Author" type="xs:string" maxOccurs="4"/>

</xs:sequence>

</xs:complexType >

<xs:complexType name="priceType">

<xs:choice>

<xs:element name="dollars" type="xs:double" />

<xs:element name="pounds" type="xs:double" />

</xs:choice>

</xs:complexType >

the <xs:all> indicator specifies that the

Book element, if present, must contain

only one instance of each of the following

four elements: Name, ID, Authors, Price.

The xs:sequence indicator in the

authorType declaration specifies that

elements of this particular type (Authors

element) contain at least one Author

element and can contain up to four

Author elements.

The xs:choice indicator in the priceType

declaration specifies that elements of this particular type (Price element) can

contain either a dollars element or a

pounds element, but not both.

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 22/42

Slide 22

Restriction

A main advantage of schema is that you have the ability to controlthe value of XML attributes and elements.

A restriction, which applies to all of the simple data elements in a

schema, allows you to define your own data type according to the

requirements by modifying the facets available for a particularsimple type.

To achieve this, use the restriction element defined in the schema

namespace.

W3C XML Schema defines 12 facets for simple data types.Enumeration, maxExclusive, minExclusive, maxInclusive, minInclusive,

maxLength, minLength, pattern, length, whiteSpace, fractionDigits,

totalDigits

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 23/42

Slide 23

Example - To restrict the length of the text node

An example that shows how to restrict the length of the text node

<xs:element name="title">

<xs:complexType>

<xs:simpleContent>

<xs:restriction base="tokenWithLangAndNote"><xs:maxLength value="255"/>

<xs:attribute name="lang" type="xs:language"/>

<xs:attribute name="note" type="xs:token"/>

</xs:restriction>

</xs:simpleContent>

</xs:complexType>

</xs:element>

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 24/42

Slide 24

Example – Remove an attribute from the element

To remove the note attribute from the element title, we declare note to

be prohibited in the list of attributes in the restriction:

<xs:element name="title">

<xs:complexType>

<xs:simpleContent>

<xs:restriction base="tokenWithLangAndNote">

<xs:maxLength value="255"/>

<xs:attribute name="lang" type="xs:language"/>

<xs:attribute name="note" use="prohibited"/>

</xs:restriction></xs:simpleContent>

</xs:complexType>

</xs:element>

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 25/42

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 26/42

Slide 26

Facets Contd…

maxInclusive - Numeric value of the data type is less than or

equal to the value specified.

minInclusive - Numeric value of 

the data type is greater than orequal to the value specified.

<xs:simpleType name="id">

<xs:restriction base="xs:integer">

<xs:minInclusive value="0"/>

<xs:maxInclusive value="100"/>

</xs:restriction>

</xs:simpleType>

maxLength - Specifies the maximum

number of characters or list items

allowed in the value.

minLength - Specifies the minimum

number of characters or list items

allowed in the value.

pattern - Value of the data type is

constrained to a specific sequence of 

characters that are expressed using

regular expressions.

<xs:simpleType name="nameFormat">

<xs:restriction base="xs:string">

<xs:minLength value="3"/>

<xs:maxLength value="10"/>

<xs:pattern value="[a-z][A-Z]*"/>

</xs:restriction>

</xs:simpleType>

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 27/42

Slide 27

Facets Contd…

length - Specifies the exact number of 

characters or list items allowed in thevalue.

<xs:simpleType name="secretCode">

<xs:restriction base="xs:string">

<xs:length value="5"/>

</xs:restriction></xs:simpleType>

whiteSpace - Specifies the method for

handling white space. Allowed values for

the value attribute are preserve,

replace, and collapse.

<xs:simpleType name="FirstName">

<xs:restriction base="xs:string">

<xs:whiteSpace value="preserve"/>

</xs:restriction>

</xs:simpleType>

fractionDigits - Constrains themaximum number of decimal

places allowed in the value.

totalDigits - The number of 

digits allowed in the value.<xs:simpleType name="reducedPrice">

<xs:restriction base="xs:float">

<xs:totalDigits value="4"/>

<xs:fractionDigits value="2"/>

</xs:restriction>

</xs:simpleType>

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 28/42

Slide 28

Multiple Restriction using ‘Union’

The union has been applied on the two embedded simple types to allow values from

both data types, our new data type will now accept the values from an enumerationwith two possible values (TBD and NA).

<xs:simpleType name="isbnType">

<xs:union>

<xs:simpleType>

<xs:restriction base="xs:string">

<xs:pattern value="[0-9]{10}"/>

</xs:restriction>

</xs:simpleType>

<xs:simpleType>

<xs:restriction base="xs:NMTOKEN">

<xs:enumeration value="TBD"/><xs:enumeration value="NA"/>

</xs:restriction>

</xs:simpleType>

</xs:union>

</xs:simpleType>

Example

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 29/42

Slide 29

03_RestrictSimpleType01.xsd

This example restricts a simple type. Here we will require the value of 

the element "root" to be integer and less than 25.

04_RestrictUsingUnion01.xsd

We want the element "root" to be from the range 0-100 or 300-400

(including the border values). We will make a union from two intervals.

06_RestrictUnionEnum02.xsd

Element can contain a string from an enumerated set. Now, we want the

element "root" to have a value "N/A" or "#REF!".

14_RestrictionOfSequence.xsd

The Schema declares type "AAA", which can contain up to two sequences

of "x" and "y" elements. Then we declare the type "BBB", which is a

restriction of the type "AAA" and contain only one x-y sequence.

Example

s / Demo

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 30/42

Slide 30

Extension

The extension element defines complex types that might derive from othercomplex or simple types.

If the base type is a simple type, then the complex type can only add attributes.

If the base type is a complex type, then it is possible to add attributes and

elements.

To derive from a complex type, you have to use the complexContent

element in conjunction with the base attribute of the extension element.

Extensions are particularly useful when you need to reuse complex element

definitions in other complex element definitions.

For example, it is possible to define a Name element that contains two child

elements (First and Last) and then reuse it in other complex element definitions.

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 31/42

Slide 31

An example of extensions

<!--Base element definition -->

<xs:complexType name="Name">

<xs:sequence>

<xs:element name="First"/>

<xs:element name="Last"/>

</xs:sequence>

</xs:complexType>

<!-- Customer element that reuses it -->

<xs:complexType name="Customer">

<xs:complexContent>

<xs:extension base="Name">

<xs:sequence>

<xs:element name="phone" type="xs:string"/>

</xs:sequence>

</xs:extension>

</xs:complexContent>

</xs:complexType>

<!-- Student element that reuses it -->

<xs:complexType name="Student">

<xs:complexContent>

<xs:extension base="Name">

<xs:sequence>

<xs:element name="school" type="xs:string"/>

<xs:element name="year" type="xs:string"/>

</xs:sequence>

</xs:extension>

</xs:complexContent>

</xs:complexType>

Example

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 32/42

Slide 32

12_ExtensionOfSequence.xsd

Extension of a sequence. When we extend the complexType, which

contains a sequence A with a sequence B, then the sequence B will be

appended to sequence A.

Example

s / Demo

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 33/42

Slide 33

Groups

W3C XML Schema also allows the definition of  groups 

of elements and attributes.

These groups are not datatypes but containers holding a

set of elements or attributes that can be used to describe

complex types.

<!-- definition of an element group -->

<xs:group name="mainBookElements">

<xs:sequence>

<xs:element name="title" type="nameType"/>

<xs:element name="author" type="nameType"/>

</xs:sequence>

</xs:group>

<!-- definition of an attribute group -->

<xs:attributeGroup name="bookAttributes">

<xs:attribute name="isbn" type="isbnType" use="required"/>

<xs:attribute name="available" type="xs:string"/>

</xs:attributeGroup>

W3C XML Schema also allows the

definition of  groups of elements

and attributes.

<xs:complexType name="bookType">

<xs:sequence>

<xs:group ref="mainBookElements"/>

<xs:element name="character"

type="characterType"

minOccurs="0"

maxOccurs="unbounded"/>

</xs:sequence>

<xs:attributeGroupref="bookAttributes"/>

</xs:complexType>

Example

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 34/42

Slide 34

08_AttributeGroup01.xsd

Defining a group of attributes. Let's say we want to define a group of 

common attributes, which will be reused. The root element is named

"root", it must contain the "aaa" and "bbb" elements, and these elements

must have attributes "x" and "y".

12_SequenceChoiceGroup.xsd

Element which contains two "patterns" (sequences), in any order. We

want to have the root element to be named "AAA", from null namespace

and contains two patterns in any order. The first pattern is a sequence of 

"BBB" and "CCC" elements, the second one is a sequence of "XXX" and"YYY" element. The element "choice" allows one of the cases: either the

sequence "myFirstSequence"-"mySecondSequence" or

"mySecondSequence"-"myFirstSequence".

Example

s / Demo

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 35/42

Slide 35

List Datatypes

List datatypes are special cases in

which a structure is defined within

the content of a single attribute or

element.

IDREFS, ENTITIES, and NMTOKENS are

predefined list datatypes

As we have seen with these threedatatypes, all the list datatypes that

can be defined must be whitespace-

separated. No other separator is

accepted.

The definition of a list datatype by

reference to an existing type is donethrough a itemType attribute:

<xs:simpleType name="integerList">

<xs:list itemType="xs:integer"/>

</xs:simpleType>

The definition of a list datatype can

also be done by embedding a

xs:simpleType element:

<xs:simpleType name="myIntegerList">

<xs:list>

<xs:simpleType>

<xs:restriction base="xs:integer">

<xs:maxInclusive value="100"/>

</xs:restriction>

</xs:simpleType>

</xs:list>

</xs:simpleType>

This datatype can be used to define

attributes or elements that accept a

whitespace-separated list of integers

smaller than or equal to 100 such as: "1

-25000 100." 

Example

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 36/42

Slide 36

09_ListDataType01.xsd

Attribute contains a list of values. Now, we want the "root" element to

have attribute "xyz", which contains a list of three integers. We will

define a general list (element "list") of integers and then restrict it

(element "restriction") to have exact length (element "length") of three

items.

10_ListDataType02.xsd

Element contains a list of values. Now, we want the "root" element to

contain a list of three integers. We will define a general list (element

"list") of integers and then restrict it (element "restriction") to have exactlength (element "length") of three items.

Example

s / Demo

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 37/42

© 2008 MindTree Consulting

More Examples

Examples / Demo

Example

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 38/42

Slide 38

15_CustomSimpleType.xsd

Definition of a custom simpleType - temperature must be greater than

-273.15. The element "T" must contain number greater than -273.15. We

will define our custom type for temperature named "Temperature" and

will require the element "T" to be of that type.

16_PatternElement.xsd

String must contain e-mail address. The element "A" must contain an

email address. We will define our custom type, which will at least

approximately check the validity of the address. We will use the

"pattern" element, to restrict the string using regular expressions.

Example

s / Demo

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 39/42

Slide 39

Summary

W3C XML Schema has become the de facto standard for definingthe structure of an XML document and for checking the validity of 

XML documents. Using schema, it is possible to define:

Elements (simple and complex)

AttributesFacets for XML elements

The structure of a document (order indicators)

The allowable number of elements (occurrence indicators) in an XML

document

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 40/42

Slide 40

References

ibm.com/developerWorksIBM XML certification success, Part 1:

W3schools.com

www.Xml.com

XML Schema by OReilly

http://www.zvon.org/xxl/XMLSchemaTutorial

Examples used in the presentation are attached here

XML-Schema-Project.zip

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 41/42

Slide 41

Questions

8/14/2019 Session04 XML Validation Schema

http://slidepdf.com/reader/full/session04-xml-validation-schema 42/42

Thank you

XML Technology, Semester 4

SICSR Executive MBA(IT) @ MindTree, Bangalore, India

By Neeraj Singh (toneeraj(AT)gmail(DOT)com