Upload
steven-maidens
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
SE 5145 – eXtensible Markup Language (XML )
XML Schema
2011-12/Spring, Bahçeşehir University, Istanbul
3rd Assignment: Validating XML with DTD & XML Schema (page 1/2)
The goal of this exercise is to understand the basic concepts of XML Schema and how it extends the capabilities of DTDs. You will use your XML Resume (CV) that you provided in Assignment 2.
Task 1. XML Schema: Write an XML schema definition for your XML Resume satisfying the following requirements:
For any date in your Resume XML, make sure that your XML Schema checks for a valid date value. Try to
avoid xs:string as much as possible, or if you think that something really is a string, use your own string type which for example could take care of checking for a maximum length and some character set (a xs:pattern could be used to achieve the latter).
Make sure that at least one of your types is used by more than one element (because reuse is good). In real-life applications, you would start to design a type library, and then start using it when constructing your schema from the ground up.
Use minOccurs and maxOccurs to restrict the cardinality of some elements.
See next slide..
2
3rd Assignment: Validating XML with DTD & XML Schema (page 2/2)
The following are recommended (but optional) for this assignment: Depending on how similar or different your employer and education entries are, try to think of a way how you
could find some structural similarity between these entries and then represent this similarity using complex type derivation.
Try to add a targetNamespace to your schema, so that your Resume schema now is a full-grown schema with its own namespace. Don't forget that you have to change the instance (by using the namespace there) to match the schema when you do that.
Identity constraints could be used to check various aspects of the Resume , depending on what you think should be unique, a key, or a reference to an existing key. A typical example would be to have a key for institutions (educational or companies), and then have each of your skills reference this key so that you can represent where you have acquired each skill.
Task 3 – Validate XML: Use a tool to validate your XML Resume (*.xml) using your XML schema (*.xsd). A suitable online tool is http://www.xmlvalidation.com/. On the first page, provide the XML document and
select ‘Validate against external XML schema’. Click ‘Validate’ and provide the XML schema on the second page.
Another tool is Altova XML Spy (You can use download & use a trial version) Alternatively, you can remember and use the recommended tools described by Melike (validator.w3.org) and
Erokan (iexmltls.exe, msval.vbs) from the last presentations.
3
XML Schemas
“Schemas” is a general term--DTDs are a form of XML schemas According to the dictionary, a schema is “a structured
framework or plan” When we say “XML Schemas,” we usually mean the
W3C XML Schema Language This is also known as “XML Schema Definition” language, or
XSD It has been introduced to overcome some of the commonly
observed limitations of DTDs, most notably the lack of typing
DTDs, XML Schemas, RELAX NG and Schematron are all XML schema languages
4
What’s Wrong with DTDs? DTDs do not support application-level datatypes
XML for B2B is very data-centric and needs typing SGML was created for documents where typing was less important
DTDs do not support any relationships between markup constructs content models cannot be reused attribute lists cannot be reused structural relationships cannot be exploited in the DTD
DTDs provide a very weak specification language No restrictions on text content Very little control over mixed content (text plus elements) Little control over ordering of elements
DTDs are written in a strange (non-XML) format You need separate parsers for DTDs and XML
5
Why XML Schemas? XML Schema Definition language (XSD) solves these problems XSD allows you to constrain the content of XML documents like DTDs, but
they are much more powerful & sophisticated. XSD allows a much finer level of control over structure and content XSD is written using XMLsyntax instead of a custom syntax like DTDs use
XML Schema's simple data type provide some semantics a formerly undescribed attribute can now be described as being a xs:date it can be understood as being a date and inserted into a calendar but what kind of date is it? a birthday? an order date? a shipping date? a question of the context of where the xs:date appears
XML Schema better supports model-level information however, XML Schema also only captures part of the application semantics an XML Schema is usually better than a DTD, because it contains types types provide information about the basic datatypes being used additional semantics (e.g., different kinds of dates) must be documented elsewhere
6
Why not XML schemas?
DTDs have been around longer than XSD Therefore they are more widely used Also, more tools support them
Power of XSD comes with a price: XSD is a little harder and more verbose to write than DTDs,
even by XML standards More advanced XML Schema instructions can be non-
intuitive and confusing
Nevertheless, XSD is not likely to go away quickly
7
Validation and Typing
XML Schema does two things at the same time:
1. Validation checks for structural integrity (is the document schema-valid?)
checking elements and attributes for proper usage (as with DTDs) checking element contents and attribute values for proper values
2. Type annotations make the types available to applications instead of having to look at the schema, applications get the Post-Schema
Validation Infoset (PSVI) type-based applications (such as XSLT 2.0) can work on the typed
instance
8
Schema-Validation and Applications
9
Anatomy of a Schema
Schema uses the namespace defined by http://www.w3.org/2001/XMLSchema and usually uses xsd or xs prefix in the XML code
The file extension is .xsd The root element is <schema> XSD starts like this:
<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.rg/2001/XMLSchema">
10
Referring to a schema
To refer to a DTD in an XML document, the reference goes before the root element:
<?xml version="1.0"?><!DOCTYPE rootElement SYSTEM "url"><rootElement> ... </rootElement>
To refer to an XML Schema in an XML document, the reference goes in the root element:
<?xml version="1.0"?><rootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" (The XML Schema Instance reference is required) xsi:noNamespaceSchemaLocation="url.xsd"> (This is where your XML Schema definition can be found) ...</rootElement>
11
TYPES A type is a set of values
the values can be enumerated (home, mobile, office) the values can be described by extension (intervals, regular expressions)
DTDs have (almost) no types element content is always #PCDATA (any number of any characters) attributes most often are CDATA (any number of any characters) attributes may have enumerated types (but no extensional types) attributes may use ID/IDREF
12
TYPES
DTD XML Schema
Concepts some conceptual model (formal/informal)
Types ID/IDREF and (#P)CDATA Hierarchy of Simple and Complex Types
Markup Constructs
Element Type Declarations<!ELEMENT order ...
Element Definitions<xs:element name="order"> ...
Instances (Documents)
<order date=""> [ order content ] </order>
13
“Simple” and “complex” elements A “simple” element is one that contains text and nothing
else A simple element cannot have attributes A simple element cannot contain other elements A simple element cannot be empty However, the text can be of many different types, and may
have various restrictions applied to it If an element isn’t simple, it’s “complex”
A complex element may have attributes A complex element may be empty, or it may contain text,
other elements, or both text and other elements
14
“Simple” types Simple types describe values not structured by XML markup
they describe attribute values (date="2006-10-03") they describe element content (<phone>+1-510-6432253</phone>)
Simple types can be used for elements or attributes XML Schema treats contents in elements and attributes equally simple type libraries can be designed independent of their eventual use
Simple types are available in three flavors atomic types: one value of one type (one number in some range) union types: one value of a union of types (a number or the
string undefined) list types: a whitespace-separated list of values (phone type="home
office")
15
Named vs. Anonymous Types can be named or anonymous named types have a name and can be referenced (and thus be reused) anonymous types have no name and can only be used where they are defined
16
Type Definitions Simple types are sets of values
named simple types are sets of values with a name (and thus reusable) anonymous simple types are sets of values defined where they are needed
Simple types are defined to represent model-level information in most cases, they will have restrictions associated with them they may also simply be tags for semantics (fax and phone numbers share
the same value space) XML Schema has a library of built-in datatypes
ur-types are the conceptual grounding of all types primitive types are the types that are there by definition derived types are based on primitive types users can derive their own types using simple type restriction
17
Type Hierarchy
18
Built-In Types
19
Declaring Elements with Schema
Elements can be declared as having a simple or complex type Types can be either built-in or defined by your Schema
Elements can also have mixed, empty, or element content, just like in DTDs
Elements can be given a minimum and maximum number of times that they are allowed to occur
20
Declaring Elements with Schema
21
Defining a simple element
A simple element is defined asxs:element name="name" type="type" minoccurs/maxoccurs="number/unbounded" />
where: name is the name of the element the most common values for type are
xs:boolean xs:integer xs:date xs:string xs:decimal xs:time
minoccurs and maxoccurs are optional, default value= 1
Other attributes a simple element may have: default="default value" if no other value is
specified fixed="value" no other value may be specified
22
Custom Simple Types with Restrict
You can define your own custom simple types by deriving them from existing simple types with restriction. the base type must be a simple type the derived type will be a simple type all simple types form a tree, rooted as
the anySimpleType Restriction are based on facets
each restriction can use 0-n facets facets can be refined in further simple type restrictions XML Schema designers should try to restrict types as
much as possible – WHY ?23
Restrictions
The general form for putting a restriction on a text value is:
<xs:element name="name"> (or xs:attribute) <xs:restriction base="type"> ... the restrictions ... </xs:restriction></xs:element>
For example: <xs:element name="age">
<xs:simpleType> <xs:restriction base="xs:positiveInteger"> <xs:minInclusive value="0"/> <xs:maxInclusive value="140"/> </xs:restriction><xs:simpleType></xs:element> 24
Facets
Facets define a certain way of restricting a simple type Facets may be repeated in different levels of the type
hierarchy Not all facets are applicable to all types
the applicability depends on the primitive type being used
25
Restrictions on numbers
minInclusive -- number must be ≥ the given value
minExclusive -- number must be > the given value
maxInclusive -- number must be ≤ the given value
maxExclusive -- number must be < the given value
totalDigits -- number must have exactly value digits
fractionDigits -- number must have no more than value digits
after the decimal point
26
Restrictions on strings
length -- the string must contain exactly value characters
minLength -- the string must contain at least value characters
maxLength -- the string must contain no more than value characters
pattern -- the value is a regular expression that the string must match
whiteSpace -- not really a “restriction”--tells what to do with whitespace value="preserve" Keep all whitespace
value="replace" Change all whitespace characters to spaces value="collapse" Remove leading and trailing whitespace, and replace
all sequences of whitespace with a single space
27
Patterns
Patterns restrict the lexical space of simple types most other facets restrict the value space (e.g., intervals of numbers) in many cases, patterns are useful additions to value-oriented facets
Patterns are regular expressions they support many common regex constructs and Unicode the language pattern allows de, de-CH, and other tags the pattern checks for lexical correctness, not against a code list
([a-zA-Z]{2}|[iI]-[a-zA-Z]+|[xX]-[a-zA-Z]{1,8})(-[a-zA-Z]{1,8})*
28
Facet Limitations
Facets limit one dimension of a type's value space using pattern, the lexical space can also be restricted restrictions should be made as specific as possible no limitations are possible beyond the predefined facets
There is no connection to the context within the document facets cannot make references to other values (e.g., neighboring attributes)
Additional constraints should be documented documentation enables applications to implement constraint checking other schema languages (like Schematron) may be used to express these
constraints
29
Enumeration
An enumeration restricts content to allowable choices
Example: <xsd:element name="season">
<xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Spring"/> <xsd:enumeration value="Summer"/> <xsd:enumeration value="Autumn"/> <xsd:enumeration value="Fall"/> <xsd:enumeration value="Winter"/> </xsd:restriction> </xsd:simpleType></xsd:element>
30
Simple Type Examples
31
What is a Complex Type ?
Complex types describe the allowed element content they describe what the element may contain (the element's content
model) they describe the attributes that an element may have (the
element's attribute list) Complex types do not define the element name
they define which content is allowed for the element the element definition uses the complex type to define the allowed
element content Complex types have similar properties to simple types
they can be named or anonymous Complex Type Derivation can be used to construct a type hierarchy
32
Declaring Complex Elements
To declare the elements with complex type: Use the xsd:anyType value for the type attribute Use the <xsd:complexType> tag in the definition
Structure: <xs:element name="name"> <xs:complexType> ... information about the complex type... </xs:complexType> </xs:element>
Remember that attributes are always simple types
33
Complex elements
Example: <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element>
<xs:sequence> says that elements must occur in this order
34
Complex Type Example
35
Complex Types & Content Types
Complex types can have different kinds of content simple content refers to simple type content using additional
attributes complex content is anything else (anything beyond simple type
content) Complex Type Derivation heavily depends on this
classification
36
DTD Content Models Defining Elements in DTDs uses a compact syntax
XML Schema supports the same facilities with a more verbose syntax XML Schemas adds features which DTDs do not support
DTDs allow elements to be mandatory, optional, repeatable, or optional and repeatable
XML Schema allows the cardinality to be specified
DTDs allow sequences (,) and alternatives (|) XML Schema introduces a (very limited) operator for all groups
Apart from the syntax, XML Schema content models are not very different
37
Empty Content
DTDs have a special keyword for empty elements instead of the content model, the keyword EMPTY is used empty elements may still have attribute lists associated with them
XML Schema empty types are defined implicitly there is no explicit keyword for defining an empty type if a type has no model group inside it, it is empty (it still may have
attributes)
Declaring empty elements <xs:element name="myEmptyElement">
<xs:complexType></xs:complexType>
</xs:element>
38
Mixed Content DTDs define mixed content by mixing #PCDATA into the content
model DTDs always require mixed content to use the form ( #PCDATA | a | b )* the occurrence of elements in mixed content cannot be controlled
XML Schema defines mixed content outside of the content model the content model is defined like an element-only content model the mixed attribute on the type marks the type as being mixed
Example: (only one subtitle is allowed, why ?)
39
Mixed Content XML Schema mixed content can use all model groups
it is possible to constrain element occurrences in the same way as in element-only content
in practice, this feature is rarely used (mixed content often is very loosely defined)
40
Defining an attribute
Attributes are always declared as simple types Any of the simple types that can be used for elements can also be
used for attributes.
An attribute is defined as <xs:attribute name="name" type="type" />where:
name and type are the same as for xs:element
41
Defining an attribute
Other attributes a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be specified use="required" attribute must be present use="optional" attribute is not required
(default) use="prohibited" attribute can not be used
Example: <xsd:attribute name="city" type="xsd:string"
use="optional" default="istanbul"/>
42
Adding attributes to the elements
Adding attributes to an element that has an empty content model
43
Adding attributes to the elements
Adding attributes to an element that only has character data content
44
Adding attributes to the elements
Adding attributes to an element that have element or mixed content models
45
Global and local definitions
Elements declared at the “top level” of a <schema> are available for use throughout the schema
Elements declared within a xs:complexType are local to that type Thus, in
<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element>the elements firstName and lastName are only locally declared
The order of declarations at the “top level” of a <schema> do not specify the order in the XML data document 46
Declaration and use
So far we’ve been talking about how to declare types, not how to use them
To use a type we have declared, use it as the value of type="..." Examples:
<xs:element name="student" type="person"/> <xs:element name="professor" type="person"/>
Scope is important: you cannot use a type if is local to some other type
47
Declaring elements with element content
Sequence: child elements must appear in order All: child elements can occur in any order Choice: any one of the child elements from a list
48
sequence
child elements must appear in a specific order: <xs:element name="person">
<xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element>
49
xs:all Child elements can appear in any order <xs:element name="person">
<xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:all> </xs:complexType> </xs:element>
Despite the name, the members of an xs:all group can occur once or not at all
You can use minOccurs="n" and maxOccurs="n" to specify how many times an element may occur (default value is 1)
In this context, n may only be 0 or 1
50
xs:choice Aany one of the child elements from a list
<xs:element name="vehicle"> <xs:complexType> <xs:choice> <xs:element name="car" type="xs:integer" />
<xs:element name="bus" type="xs:string" /> <xs:element name="van" type="xs:string" /> <xs:element name="motorcycle" type="xs:integer" />
</xs: choice > </xs:complexType> </xs:element>
51
Referencing
Once you have defined an element or attribute (with name="..."), you can refer to it with ref="..."
Example: <xs:element name="person">
<xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:all> </xs:complexType> </xs:element>
<xs:element name="student" ref="person"> Or just: <xs:element ref="person">
52
Text element with attributes
If a text element has attributes, it is no longer a simple type
<xs:element name="population"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:integer"> <xs:attribute name="year" type="xs:integer"> </xs:extension> </xs:simpleContent> </xs:complexType>
</xs:element>
53
Empty elements
Empty elements are (ridiculously) complex
<xs:complexType name="counter"> <xs:complexContent> <xs:extension base="xs:anyType"/> <xs:attribute name="count" type="xs:integer"/> </xs:complexContent></xs:complexType>
54
Mixed elements
Mixed elements may contain both text and elements We add mixed="true" to the xs:complexType
element The text itself is not mentioned in the element, and
may go anywhere (it is basically ignored)
<xs:complexType name="paragraph" mixed="true"> <xs:sequence> <xs:element name="someName" type="xs:anyType"/> </xs:sequence></xs:complexType>
55
Extensions
You can base a complex type on another complex type <xs:complexType name="newType">
<xs:complexContent> <xs:extension base="otherType"> ...new stuff... </xs:extension> </xs:complexContent></xs:complexType>
56
Predefined string types
Recall that a simple element is defined as: <xs:element name="name" type="type" />
Here are a few of the possible string types: xs:string -- a string xs:normalizedString -- a string that doesn’t contain
tabs, newlines, or carriage returns xs:token -- a string that doesn’t contain any
whitespace other than single spaces Allowable restrictions on strings:
enumeration, length, maxLength, minLength, pattern, whiteSpace
57
Predefined date and time types
xs:date -- A date in the format CCYY-MM-DD, for example, 2002-11-05
xs:time -- A date in the format hh:mm:ss (hours, minutes, seconds)
xs:dateTime -- Format is CCYY-MM-DDThh:mm:ss
Allowable restrictions on dates and times: enumeration, minInclusive, maxExclusive,
maxInclusive, maxExclusive, pattern, whiteSpace
58
Predefined numeric types
Here are some of the predefined numeric types:
Allowable restrictions on numeric types: enumeration, minInclusive, maxExclusive,
maxInclusive, maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace
xs:decimal xs:positiveIntegerxs:byte xs:negativeIntegerxs:short xs:nonPositiveIntegerxs:int xs:nonNegativeIntegerxs:long
59
Practice..
60