ITR3 lecture 3: Namespaces, XML Schema & XSL Thomas Krichel 2002-09-10

Preview:

Citation preview

ITR3 lecture 3: Namespaces, XML Schema &

XSLThomas Krichel

2002-09-10

Gee….

• Birdseye view only, have a look at what these things do.

• If there is interest, I can teach some more in a separate course.

• Structure– Some XML related standards– Namespaces– XML Schema– XSL

Literature

• Castro, Elizabeth (2001). XML for the World Wide Web: Visual QuickStart Guide. Peachpit Press.

• Duckett, Jon et al. (2001). Professional XML Schemas. Wrox Press (recommended)

• Kay, Michael (2001). XSLT (2nd ed.). Wrox Press.

XHTML

• This is HTML redefined so that it becomes well-formed XML

• Examples– Case-sensitive elements– <p> replaced by <p/>

• Verdict: pain without gain

Resource Description Framework (RDF)

• A standard issued by the W3C. A framework to encode meaning to make it computer processable.

• Uses the approach of a directed graph.• Generalizes an object / property / value approach

– Value may be another object. – Objects are URI identified by a URI.– Properties may be identified with a URI

• A paper on RDF available at http://openlib.org/home/krichel/papers/anhalter.letter.pdf

• RDF XML syntax is defined but currently being reworked.

• Verdict: very costly to implement.

Cascading style sheets (CSS)

• a non-XML way of writing stylesheets that can be applied to both XML and HTML. Widely supported by browsers.

• Written as a sequence of rules. Example

compositionyear, recordingyear {

color: red;

font-family: sans-serif }

• Verdict: not flexible

XPath and XPointer

• are non-XML syntaxes referring to parts of an XML document, specific – Ranges– points– sets of XML document.

• There are used in other XML related standards, in particular, in XSL will be covered as part of XSL.

• Verdict: useful

XLinks

• is an XML syntax to link XML documents.

• They go way beyond the conventional linking capabilities of HTML, but there is no obvious way for the browser to represent them.

• Verdict: nonsense

Document Object Model DOM

• “a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model provides a standard set of objects for representing HTML and XML documents, a standard model of how these objects can be combined, and a standard interface for accessing and manipulating them.”

• Now at ''Level 3''. • Works by building a tree out of a document.• Verdict: exxxtremly complicated

Simple API for XML (SAX)

• SAX is an event-based paring model. It reports parsing events (such as the start and end of elements) directly to the application through callbacks

• Does not usually build an internal tree.• A lot less resource-intensive,

– when the document is large– when the task is simple.

• Verdict: thumbs up!

XML Information Sets

• best understood through an example. Consider two XML snippets.

• Snippet 1 <person sex="female"> Margarete Krichel</person>

• Snippet 2 <person sex='female'>Margarete Krichel </person>

• Are they the same?

XML Namespaces

• Allow to make XML element names and attribute name globally unique by associating them with a particular URI, usually a URL.

• The globally unique name is called the qualified name or qname, for short.

• The name without the namespace URI called the local name.

• This is done through a namespaces declaration, and a prefix. The namespace declaration associates a short string, called a prefix with the namespace.

• The qualified name can then be written as prefix:localname

Namespace syntax

• <element xmlns[:prefix]=URI> … </element>• element is the element name • prefix is the prefix• URI is a URI, often a URL, actually.• [ ] indicate that it is optional. If the prefix is

missing it means that all elements that have no namespace prefix belong, by default to the declared namespace.

• Namespace declaration remains local to the children of element.

Avoiding cerebral indigestion related tonamespaces

• Expect nothing if you retrieve the namespace URI, when it is a URL.

• Prefixes can be any short string. Some prefixes are customary, like xsi for http://www.w3.org/2001/XMLSchema-instance

• Default attributes only apply to elements not attributes. Attributes belong to the namespace of their elements, unless it has an explicit prefix.

XML Schemas

http://www.w3.org/TR/xmlschema-0/ (Primer) http://www.w3.org/TR/xmlschema-1/ (Structures) http://www.w3.org/TR/xmlschema-2/ (Datatypes)

What is XML Schema?

• XML Schema is vocabulary for expressing constraints for the validity of an XML document.

• A piece of XML is valid if it satisfies the constraints expressed in another XML file, the schema file.

• The idea is to check if the XML file is fit for a certain purpose.

Example<location> <latitude>32.904237</latitude> <latitude>73.620290</longitude> <uncertainty units="meters">2</uncertainty></location>

To be valid, this XML snippet must meet all the following constraints: 1. The location must be comprised of a latitude, followed by a longitude, followed by an indication of the uncertainty of the lat/lon measurements. 2. The latitude must be a decimal with a value between -90 to +90 3. The longitude must be a decimal with a value between -180 to +180 4. For both latitude and longitude the number of digits to the right of the decimal point must be exactly six digits. 5. The value of uncertainty must be a non-negative integer 6. The uncertainty units must be either meters or feet.

Validating your data

<location> <latitude>32.904237</latitude> <longitude>73.620290</longitude> <uncertainty units="meters">2</uncertainty></location>

-check that the latitude is between -90 and +90-check that the longitude is between -180 and +180- check that the fraction digits is 6 …Etc..

XML instance

XML Schemavalidator

Data is ok!

XML Schema file

software

History of Schema• Once upon a time, there was SGML

• SGML has a “schema” language called a DTD.

• It is crap– Different syntax then SGML– Main focus on presence and absence of

elements– Very limited capabilties to check contents

of elements (datatypes)

XML Schemas can constrain

• the structure of instance documents– "this element contains these elements, which

contains these other elements“, etc

• the datatype of each element/attribute– "this element shall hold an integer with the

range 0 to 12,000"

Highlights of XML Schemas• 44 built-in datatypes• Can create your own datatypes by extending or restricting

existing datatypes• Written in the same syntax as instance documents• Can express sets, i.e., can define the child elements to occur in

any order• Can specify element content as being unique (keys on content)

and uniqueness within a region• Can define multiple elements with the same name but different

content• Can define elements with nil content• Can define substitutable elements

important schema concepts• simple types: types that can not have

child elements– elements that only have text contents and

no attributes– attributes

• complex type: type of anything that can have child attributes

important schema concepts

• global declarations are direct children of the root schema element. They are visible everywhere.

• all local declarations are local and are limited in scope to the element that they appear within

important schema concepts• Value space. The range of values that

the type can take• Lexical space. The range litterals that

represent the value• Set of facets. The defining properties of

a type. – Fundamental facets include equality, order,

bounds, cardinality, numeric/non-numeric– Constraining facets include ranges for

numbers, string lengths, or a regular expressions

Namespaces

• XML Schema file mixes vocabulary from the XML Schema language with own vocabulary to be created.

• Has to keep both separate using namespaces.

• Namespaces associate a URI with names.

elementcomplexType

schema

sequence

http://www.w3.org/2001/XMLSchema

string

integer

boolean

BookStore

BookTitle

Author

Date

ISBNPublisher

http://www.books.org (targetNamespace)

This is the vocabulary that XML Schemas provide to define yournew vocabulary

This is the vocabulary for our book store xml description.

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>

BookStore.xsd (see example01)xsd = Xml-Schema Definition

(explanations onsucceeding pages)

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>

All XML Schemas have"schema" as the rootelement.

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>

The elements anddatatypes thatare used to constructschemas - schema - element - complexType - sequence - stringcome from the http://…/XMLSchemanamespace

elementcomplexType

schema

sequence

http://www.w3.org/2001/XMLSchema

XMLSchema Namespace

string

integer

boolean

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>

Says that theelements definedby this schema - BookStore - Book - Title - Author - Date - ISBN - Publisherare to go in thisnamespace

BookStore

BookTitle

Author

Date

ISBNPublisher

http://www.books.org (targetNamespace)

Book Namespace (targetNamespace)

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>

This is referencing a Book element declaration.The Book in whatnamespace?

The default namespace ishttp://www.books.orgwhich is the targetNamespace!

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>

This is a directive to anyinstance documents whichconform to this schema: Any elements that are defined in this schemamust be namespace-qualifiedwhen used in instance documents.

Referencing a schema in an XML instance document

<?xml version="1.0"?><BookStore xmlns ="http://www.books.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.books.org BookStore.xsd"> <Book> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>July, 1998</Date> <ISBN>94303-12021-43892</ISBN> <Publisher>McMillin Publishing</Publisher> </Book> ...</BookStore>

1. First, using a default namespace declaration, tell the schema-validator that all of the elementsused in this instance document come from the http://www.books.org namespace.

2. Second, with schemaLocation tell the schema-validator that the http://www.books.org namespace is defined by BookStore.xsd (i.e., schemaLocation contains a pair of values).

3. Third, tell the schema-validator that the schemaLocation attribute we are using is the one inthe XML Schema-instance namespace.

1

2

3

schemaLocationtype

noNamespaceSchemaLocation

http://www.w3.org/2001/XMLSchema-instance

XMLSchema-instance Namespace

nil

Referencing a schema in an XML instance document

BookStore.xml BookStore.xsd

targetNamespace="http://www.books.org"schemaLocation="http://www.books.org BookStore.xsd"

- defines elements in namespace http://www.books.org

- uses elements from namespace http://www.books.org

A schema defines a new vocabulary. Instance documents use that new vocabulary.

Note multiple levels of checking

BookStore.xml BookStore.xsd XMLSchema.xsd(schema-for-schemas)

Validate that the xml documentconforms to the rules describedin BookStore.xsd

Validate that BookStore.xsd is a validschema document, i.e., it conformsto the rules described in theschema-for-schemas

Using XSLT and XPath

XSL transforms XML

• XSL may be used to generate either HTML, XML, or text

XSL Processor

XSL

XML HTML (or XML or text)

Doing it using Internet Explorer

• First, download the latest version of Internet Explorer (at this time it is 6.0)

• Write an XSL stylesheet stylish.xsl• Write an XML file, and refer to the xsl

stylesheet with a processing instruction<?xml-stylesheet type="text/xsl“ href="stylish.xsl"?>

Note: this does not work with other browsers!

XML tree

• XSL has a model of XML as a tree.• XSL tree model is similar to the DOM model.• As the processor does its job it looks at

elements of the input tree and transforms them to the output tree.

• The processor only writes the file to the tree at the end.

• End points in the tree are called “nodes”.

in the general section

• we examine how XSL looks at an XML document. In fact it builds a tree.

• and then we look at a very simple way to look at what the stylesheet does. After that we have Roger showing us the details.

Seven types of nodes• root node: contains all the elements in the

document. Not to be confused with the document element of XML.

• element node: contains an element• text node: contain an as-large-as-possible area

of text.• attribute node: contains attribute name and value• comment node: contains a comment• processing instruction (p-i) node• namespace node: each element node has one

namespace node for every namespace declaration

properties of nodes: name

• This is empty for the root, text and comment nodes.

• for elments and attribute node, it is the name as it appears in the xml file, expanded by namespace declarations.

• for p-i nodes, it is the target

• for a namespace node, it is the prefix

properties of nodes: string value

• for text nodes: the text • for comment nodes: the text of the

comment• for p-i nodes: the data part of the p-i.• for an attribute node: the value of the

attribute• for a root node: the concatenation of all

the string values of all element and text children.

• for a namespace node: the URI of the namespace

properties of nodes: base URI

• for all nodes: the URI of the XML source document where the node has been found

• Only of interest for elements and p-i nodes

• for the root node: the URI of the document

• for attribute, text and comment nodes: the base URI of its parent node

properties of nodes: children

• for element nodes: all the element nodes, text nodes, p-i nodes and comment nodes between its start and end tags.

• for root nodes: all the element nodes, text nodes, p-i nodes and comment nodes that are not children of some other node.

parent node

• for all nodes except root nodes: the parent of the node.

• attribute nodes and namespace nodes have an element node as parent node, but are not considered to be its child.

property of nodes: attribute

• element: one to many attributes that the element has

• other nodes: empty

Now we look at what XSL does

Different formats…

• <xsl:output method="xml"> is the default

• <xsl:output method="html>

• <xsl:output method="text"> used for everything else. Final formatting may be up to formatting objects, anyway.

• Your stylesheet processor may have more formats, but they will be vendor-specific.

templates set rules

<xsl:template match="expression">

do some stuff

<xsl:template>

This is a rule that says, if you find a node that matches the expression expression, then go ahead and do some stuff. It is called a template. The fact that a rule is written down down does not imply that it is applied.

applying templates

• <xsl:apply-templates/>

says: apply all template rules on the current node and on all its child nodes.

Default, built-in rules for the nodes

• root: <xsl:apply-templates> on all children

• element: <xsl:apply-templates> to the current node and all its children

• attribute: copy the value as text to the output

• text: copy the text to the output

• comment, p-i, namespace: do nothing

HTML Generation

• We will first use XSL to generate HTML documents• When generating HTML, XSL should be viewed as

a tool to enhance HTML documents.– That is, the HTML documents may be enhanced

by extracting data out of XML documents– XSL provides elements (tags) for extracting the

XML data, thus allowing us to enhance HTML documents with data from an XML document

Enhancing HTML Documents with XML Data

XML Document

HTML Document(with embeddedXSL elements)

XSL element

XML data

XSLProcessor

XML data

Enhancing HTML Documents with the Following XML Data

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="FitnessCenter.xsl"?>

<FitnessCenter> <Member level="platinum"> <Name>Jeff</Name> <Phone type="home">555-1234</Phone> <Phone type="work">555-4321</Phone> <FavoriteColor>lightgrey</FavoriteColor> </Member></FitnessCenter>

FitnessCenter.xml

Embed HTML Document in an XSL Template

<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> Welcome! </BODY> </HTML> </xsl:template></xsl:stylesheet>

FitnessCenter.xsl (see html-example01)

Note

• The HTML is embedded within an XSL template, which is an XML document. The HTML must be well formed.

• We are able to add XSL elements to the HTML, allowing us to extract data out of XML documents.

• Let's customize the HTML welcome page by putting in the member's name. This is achieved by extracting the name from the XML document. We use an XSL element to do this.

Extracting the Member Name<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>! </BODY> </HTML> </xsl:template></xsl:stylesheet>

(see html-example02)

Extracting a Value from & Navigating the XML Document

• Extracting values:– use the <xsl:value-of select="…"/> XSL element

• Navigating:– The slash ("/") indicates parent/child relationship – A slash at the beginning of the path indicates that

it is an absolute path, starting from the top of the XML document

/FitnessCenter/Member/Name

"Start from the top of the XML document, go to the FitnessCenter element, from there go to the Member element, and from there go to the Name element."

Document/

PI<?xml version=“1.0”?>

ElementFitnessCenter

ElementMember

ElementName

ElementPhone

ElementPhone

ElementFavoriteColor

TextJeff

Text555-1234

Text555-4321

Textlightgrey

http://openlib.org/home/krichel

Thank you for your attention!