163
COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

Embed Size (px)

Citation preview

Page 1: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Data Description and Transformation

1. XML

2. DTD

3. XSD

4. Xpath

5. XSL /XSLT

Page 2: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

1. XML

What is XML?

Why XML?

Brief History and Versions

Sample XML Documents

XML Namespaces

Page 3: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

What is XML?

XML stands for EXtensible Markup Language

A meta-language for descriptive markup: you invent your own tags

XML uses a Document Type Definition (DTD) or an XML Schema to describe the data XML with a DTD or XML Schema is designed to be self-descriptive

Built-in internationalization via Unicode Documents can contain characters from many languages

Built-in error-handling A forgotten tag, or an attribute without quotes renders an XML

document unusable

Tons of support from the big IT companies

Page 4: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Why XML?

Much of shareable data reside in computer systems and databases in incompatible formats use conflicting hardware and/or software.

One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet

Converting the data to XML can greatly reduce the complexity and create data that can be read by many different applications XML data is stored in plain text format – hardware and software

independent

XML can be used to create new languages Allows us to define our own markup languages

Page 5: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Brief XML History

SGML (Standard Generalized Markup Language) ISO Standard, 1986, for data storage & exchange Metalanguage for defining languages (through DTDs) A famous SGML language: HTML Separation of content and display Used in U.S. gvt. & contractors, large manufacturing

companies, technical info. Publishers,... SGML reference is 600 pages long

XML W3C recommendation in 1998 Simple subset (80/20 rule) of SGML: “ASCII of the Web”,

“Semantic Web” XML specification is 26 pages long

Page 6: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

… Brief XML History 1986

SGML becomes a standard

1989 Tim Berners-Lee creates the WWW

1994 W3C established

1998 XML 1.0 W3C Recommendation

Jan 2000 XHTML becomes W3C Recommendation A Reformulation of HTML 4 in XML 1.0

Feb 2004 W3c XML 1.0 (Third Edition) Recommendation http://www.w3.org/TR/2004/REC-xml-20040204/

Feb 2004 XML 1.1 Recommendation http://www.w3.org/TR/2004/REC-xml11-20040204/ updates XML to use Unicode 3

Page 7: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML and HTML XML is not a replacement for HTML

In future Web development, XML is likely to be used to describe data while HTML will be used to format and display the same data (one interpretation of XML)

XML and HTML were designed with different goals XML was designed to describe data and to focus on what data is

XML describes only content, or “meaning” HTML was designed to display data and to focus on how data

looks. HTML describes both structure (e.g. <p>, <h2>, <em>) and

appearance (e.g. <br>, <font>, <i>)

XML is for computers while HTML is for humans XML is used to mark up data so it can be processed by

computers HTML is used to mark up text so it can be displayed to users

Page 8: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML does not DO anything

XML was not designed to DO anything A piece of software must be written to do something (send, receive or

display the document)

The following example is a book info, stored as XML:

<?xml version='1.0'?><bookstore> <book genre='autobiography' publicationdate='1981' ISBN='1-861003-11-0'> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> …</bookstore>

Page 9: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML is Free and Extensible

XML tags are not predefined You must "invent" your own tags The tags used to mark up HTML documents and the

structure of HTML documents are predefined The author of HTML documents can only use tags

that are defined in the HTML standard

XML allows the author to define his own tags and his own document structure

Page 10: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML Future XML is going to be everywhere

A large number of software vendors adopted the XML standard very quickly

XML is a cross-platform, software and hardware independent tool for transmitting information.

DocumentsConfiguration

Database

Application X

Repository

XML XML

XML XML

Page 11: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Benefits of XML

Open W3C standard – non-proprietary

Representation of data across heterogeneous environments Cross platform Allows for high degree of interoperability

E.g., ability to exchange data between incompatible applications with incompatible data formats

Strict rules that make it relatively easy to write XML parsers Syntax Structure Case sensitive

XML can make data more useful s/w, h/w and application independence of XML makes data available

to more users not only HTML browsers

Page 12: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Components of an XML Document

XML declaration

Processing instructions Encoding specification (Unicode by default) Namespace declaration Schema declaration

Elements Each element has a beginning and ending tag

<TAG_NAME>...</TAG_NAME> Elements can be empty (<TAG_NAME />)

Attributes Describes an element; e.g. data type, data range, etc. Can only appear on beginning tag

Page 13: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Components of an XML Document

Processing Instructions

Elements

Elements with Attributes

<?xml version="1.0" ?>

<?xml-stylesheet type="text/xsl" href="template.xsl"?>

<ROOT>

<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1>

<ELEMENT2> </ELEMENT2>

<ELEMENT3 type='string'> </ELEMENT3>

<ELEMENT4 type='integer' value='9.3'> </ELEMENT4>

</ROOT>

Page 14: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML Declaration

The XML declaration looks like this:<?xml version="1.0" encoding="UTF-8" standalone="yes"?> The XML declaration is not required by browsers, but is

required by most XML processors (so include it!) If present, the XML declaration must be first--not even

whitespace should precede it Note that the brackets are <? and ?> The version attribute is required encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode),

or something else, or it can be omitted An XML document is standalone if it makes use of no

external markup (DTD) declarations Default value for this attribute is no

Page 15: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Processing Instructions

A PI is a command to the program processing the XML document to handle it in a certain way

PIs (Processing Instructions) may occur anywhere in the XML document (but usually first)

XML documents are typically processed by more than one program

Programs that do not recognize a given PI should just ignore it

General format of a PI: <?target instructions?>

Example: <?xml-stylesheet type="text/css" href="mySheet.css"?>

Page 16: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML Elements

An XML element is everything from the element's start tag to the element's end tag

XML Elements are extensible and they have relationships Related as parents and children

XML Elements have simple naming rules Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names must not start with the letters xml (or XML or Xml ..) Names cannot contain spaces

Page 17: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML Attributes

XML elements can have attributes

Data can be stored in child elements or in attributes

Should you avoid using attributes? Here are some of the problems using attributes:

attributes cannot contain multiple values (child elements can) attributes are not easily expandable (for future changes) attributes cannot describe structures (child elements can) attributes are more difficult to manipulate by program code attribute values are not easy to test against a Document Type Definition

(DTD) - which is used to define the legal elements of an XML document

Experience shows that attributes are handy in HTML but child elements should be used in their place in XML Use attributes only to provide information that is not relevant to the

data

Page 18: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

An XML Document

<?xml version='1.0'?><bookstore> <book genre='autobiography' publicationdate='1981' ISBN='1-861003-11-0'> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> <book genre='novel' publicationdate='1967' ISBN='0-201-63361-2'> <title>The Confidence Man</title> <author> <first-name>Herman</first-name> <last-name>Melville</last-name> </author> <price>11.99</price> </book></bookstore>

Page 19: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Another XML Document

<?xml version="1.0"?><weatherReport> <date>7/14/97</date> <city>North Place</city>, <state>NX</state> <country>USA</country> High Temp: <high scale="F">103</high> Low Temp: <low scale="F">70</low> Morning: <morning>Partly cloudy, Hazy</morning> Afternoon: <afternoon>Sunny &amp; hot</afternoon> Evening: <evening>Clear and Cooler</evening></weatherReport>

Page 20: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML Validation

There is a difference between a well-formed XML document and a valid XML document

A well-formed XML document is one with correct XML syntax See next slide for well-formedness rules

XML syntax is constrained by a grammar (DTD or Schema) that governs the permitted tag names, attachment of attributes to tags, and so on.

A well-formed XML document that also conforms to a given DTD or schema is said to be valid. Every valid XML document is well-formed but the reverse is not

necessarily the case

Page 21: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Rules For Well-Formed XML

There must be one, and only one, root element

All XML elements must have a closing tag

Sub-elements must be properly nested

Attributes are optional Defined by an optional schema

Attribute values must be enclosed in “” or ‘’

Processing instructions are optional

XML is case-sensitive

Page 22: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML DTD A DTD defines the legal elements of an XML document

defines the document structure with a list of legal elements

XML Schema  XML Schema is an XML based alternative to DTD

Errors in XML documents will stop the XML program The W3C XML specification states that a program should not continue

to process an XML document if it finds a validation error

Processing an XML document requires a software program called an XML Parser (or XML Processor) http://www.xml.com/xml/pub/Guide/xml_parsers

There are two flavors of parsers: Non-validating: checks for a document’s well-formedness (e.g.,

Browsers) Validating: checks for a document’s validity

Page 23: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Browsers Support for XML

Netscape 6 supports XML

Internet Explorer 5.0 supports the XML 1.0 standard

Internet Explorer 5.0 has the following XML support: Viewing of XML documents Displaying XML with CSS Transforming and displaying XML with XSL XML embedded in HTML as Data Islands Binding XML data to HTML elements Access to the XML DOM Full support for W3C DTD standards

Page 24: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Viewing XML Documents

Raw XML files can be viewed in IE 5.0 (and higher) and in Netscape 6

XML documents do not carry information about how to display the data To make them display like a web page, you have to add some

display information

Different solutions to the display problem, using CSS, XSL, XML Data Islands, and JavaScript

Will you be writing your future Homepages in XML? Most Microsoft pages are XML based and the server converts

them to HTML on-the-fly when requested

Page 25: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Displaying XML with CSS

With CSS (Cascading Style Sheets) you can add display information to an XML document

Formatting XML with CSS is NOT the future of the Web

Formatting with XSL will be the new standard

Page 26: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Example: the xml file

<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/css" href="cd_catalog.css"?> <CATALOG>

<CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY>

<PRICE>10.90</PRICE> <YEAR>1985</YEAR>

</CD> <CD>

<TITLE>Hide your heart</TITLE> <ARTIST>Bonnie Tyler</ARTIST>

<COUNTRY>UK</COUNTRY> <COMPANY>CBS Records</COMPANY><PRICE>9.90</PRICE> <YEAR>1988</YEAR> </CD>

. . . . </CATALOG>

Page 27: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Example: the css file

CATALOG { background-color: white; width: 100%; }

CD { display: block; margin-bottom: 30pt; margin-left: 0; }

TITLE { color: red; font-size: 20pt; }

ARTIST{ color: blue; font-size: 20pt; }

COUNTRY,PRICE,YEAR,COMPANY { display: block; color: black; margin-left: 20pt; }

Page 28: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Displaying XML with XSL

With XSL you can add display information to your XML document

XSL is the preferred style sheet language of XML XSL (the eXtensible Stylesheet Language) is far

more sophisticated than CSS One way to use XSL is to transform XML into HTML

before it is displayed by the browser

Page 29: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Example: the xml file

<?xml version="1.0" encoding="ISO-8859-1"?><?xml-stylesheet type="text/xsl" href="simple.xsl" ?><breakfast_menu>

<food><name>Belgian Waffles</name><price>$5.95</price><description>two of our famous Belgian Waffles with plenty of real maple

syrup</description><calories>650</calories>

</food><food>

<name>Strawberry Belgian Waffles</name><price>$7.95</price><description>light Belgian waffles covered with strawberries and whipped

cream</description><calories>900</calories>

</food>…

</breakfast_menu>

Page 30: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Example: the xsl file

<?xml version="1.0" encoding="ISO-8859-1"?><html xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/TR/xhtml1/strict"> <body style="font-family:Arial,helvetica,sans-serif;font-size:12pt; background-color:#EEEEEE"> <xsl:for-each select="breakfast_menu/food"> <div style="background-color:teal;color:white;padding:4px"> <span style="font-weight:bold;color:white"> <xsl:value-of select="name"/></span> - <xsl:value-of select="price"/> </div> <div style="margin-left:20px;margin-bottom:1em;font-size:10pt"> <xsl:value-of select="description"/> <span style="font-style:italic"> (<xsl:value-of select="calories"/> calories per serving) </span> </div> </xsl:for-each> </body></html>

Page 31: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

View the result in IE 6

Page 32: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML Embedded in HTML XML can be embedded within HTML pages in Data Islands

Manipulated via client side script or data binding

The unofficial <xml> tag is used to embed XML data within HTML

The id attribute of the <xml> tag defines an ID for the data island, and the src attribute points to the XML file to embed:

The next step is to format and display the data in the data island by binding it to HTML elements.

<html> <body>

<xml id="note" src="note.xml"></xml>

</body> </html>

Page 33: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Bind Data Island to HTML Elements Data Islands can be bound to HTML elements (like HTML tables)

An XML data island with ID “cdcat” is loaded from an external XML file

An HTML table is bound to the data Island with a datasrc attribute

The td elements are bound to the XML data with a datafld attribute inside a span.

<html> <body> <xml id="cdcat" src="cd_catalog.xml"></xml> <table border="1" datasrc="#cdcat"> <tr> <td> <span datafld="ARTIST"> </span> </td> <td> <span datafld="TITLE"> </span> </td></tr> </table> </body> </html>

Page 34: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

The Microsoft XML Parser

To read and update an XML document, you need an XML parser

The Microsoft XML parser comes with Microsoft Internet Explorer 5.0

Once you have installed IE 5.0, the parser is available to scripts, both inside HTML documents. The parser features a language-neutral programming model that

supports: JavaScript, VBScript, Perl, VB, Java, C++ and more W3C XML 1.0 and XML DOM DTD and validation

You can create an XML document object with the following code: var xmlDoc=new ActiveXObject("Microsoft.XMLDOM")

Page 35: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Loading an XML file into the parser

XML files can be loaded into the parser using script code.

The following code loads an XML document (note.xml) into the XML parser: <script type="text/javascript">

var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")

xmlDoc.async="false" xmlDoc.load("note.xml") // ....... processing the document goes here </script>

The second line in the code above creates an instance of the Microsoft XML parser

The third line turns off asynchronized loading, to make sure that the parser will not continue execution before the document is fully loaded

The fourth line tells the parser to load the XML document called note.xml

We will revisit these issues later

Page 36: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Namespaces XML allows you to define a new document format by combining and reusing

other formats This can lead to name conflicts since the document formats being combined

may have the same element names that are used for different purposes

Namespaces allow authors to differentiate between tags of the same name (using a prefix) That is, name conflicts are solved using a prefix Frees author to focus on the data and decide how to best describe it

The W3C namespace specification states that a namespace should be identified by a URI (Uniform Resource Identifier)

A URI is a string of characters which identifies an Internet resource A URL is the most common URI used to identify resources and their location on

the Internet Another less common type of URI is URN (Universal Resource Name)

When a URL is used in a namespace declaration, the URL does NOT have to represent a live server

The only purpose is to give the namespace a unique name. However, very often companies use the namespace as a pointer to a real Web page containing information about the namespace

Page 37: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Namespaces: Declaration

xmlns: bk = "http://www.example.com/bookinfo/"

xmlns: bk = "urn:mybookstuff.org:bookinfo"

Namespace declaration

Namespace declaration examples:

Prefix URI (URL)

xmlns: bk = "http://www.example.com/bookinfo/"

Page 38: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Namespaces: Examples

<BOOK xmlns:bk="http://www.bookstuff.org/bookinfo"> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency='US Dollar'>19.99</bk:PRICE></BOOK>

<bk:BOOK xmlns:bk="http://www.bookstuff.org/bookinfo"xmlns:money="urn:finance:money">

<bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE money:currency='US Dollar'> 19.99</bk:PRICE></bk:BOOK>

Page 39: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Namespaces: Default Namespace

An XML namespace declared without a prefix becomes the default namespace for all sub-elements

All elements without a prefix will belong to the default namespace:

<BOOK xmlns="http://www.bookstuff.org/bookinfo"> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR>

Page 40: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Namespaces: Scope

Unqualified elements belong to the inner-most default namespace. BOOK, TITLE, and AUTHOR belong to the default BOOK

namespace PUBLISHER and NAME belong to the default PUBLISHER

namespace

<BOOK xmlns="www.bookstuff.org/bookinfo"> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> <PUBLISHER xmlns="urn:publishers:publinfo"> <NAME>Microsoft Press</NAME> </PUBLISHER></BOOK>

Page 41: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Namespaces: Attributes

Unqualified attributes do NOT belong to any namespace Even if there is a default namespace They don’t need to since scope of attributes is only

within the element for which they are attributes

This differs from elements, which belong to the default namespace

Page 42: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Entities

Entities provide a mechanism for textual substitution for special characters, e.g.

XML parsers normally parse all the text in an XML document

When an XML element is parsed, the text between the XML tags is also parsed

If you place special characters like “<“ inside an XML element, it will generate an error because the parser interprets it as the start of a new element Entity references are used to avoid such errors

Entity Substitution

&lt; <

&amp; &

Page 43: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

CDATA By default, all text inside an XML document is parsed

You can force text to be treated as unparsed character data by enclosing it in <![CDATA[ ... ]]>

Any characters, even & and <, can occur inside a CDATA

Whitespace inside a CDATA is (usually) preserved

The only real restriction is that the character sequence ]]> cannot occur inside a CDATA

CDATA is useful when your text has a lot of illegal characters (for example, if your XML document contains some HTML text)

Page 44: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

CDATA<?xml version=‘1.0’?>

<myTag>

<![CDATA[

function matchwo(a,b){ if(a<b && a<0) then

return 1;

else

return 0;

}

]]>

</myTag>

Page 45: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

References

W3 Schools XML Tutorial http://www.w3schools.com/xml/default.asp

W3C XML page http://www.w3.org/XML/

XML Tutorials http://www.programmingtutorials.com/xml.aspx

Online resource for markup language technologies http://xml.coverpages.org/

Several Online Presentations

Page 46: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

2 Document Type Definitions (DTDs)

What are DTDs?

Why DTDs?

DTD Syntactic Elements ELEMENT ATTRIBUTE ENTITY Types

Examples

Validation

Page 47: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

What are DTDs?

Document Type Definition (DTD) is a grammar that describes the structure of a class of XML documents structure of the documents is described via

element and attribute-list declarations.

Element declarations name the allowable set of elements within the document, and specify whether and how declared elements and runs of character

data may be contained within each element.

Attribute-list declarations name the allowable set of attributes for each declared element,

including the type of each attribute value, if not an explicit set of valid value(s).

DTDs are written in EBNF-like notation

Page 48: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Why DTDs?

XML documents are designed to be processed by computer programs If you can put just any tags in an XML document, it’s very hard to

write a program that knows how to process the tags A DTD specifies what tags may occur, when they may occur,

and what attributes they may (or must) have

A DTD allows the XML document to be verified (shown to be legal)

A DTD that is shared across groups allows the groups to produce consistent XML documents

Page 49: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Parsers

An XML parser is an API that reads the content of an XML document Currently popular APIs are DOM (Document Object

Model) and SAX (Simple API for XML)

A validating parser is an XML parser that compares the XML document to a DTD and reports any errors

Page 50: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

An XML example <novel>

<foreword> <paragraph>This is a great novel.

</paragraph> </foreword> <chapter number="1"> <paragraph>It was a dark and stormy

night.</paragraph> <paragraph>Suddenly, a shot rang

out!</paragraph> </chapter></novel>

An XML document contains (and the DTD describes): Elements, such as novel and paragraph, consisting of tags and

content Attributes, such as number="1", consisting of a name and a value Entities (not used in this example)

Page 51: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

A DTD example <!DOCTYPE novel [

<!ELEMENT novel (foreword, chapter+)> <!ELEMENT foreword (paragraph+)> <!ELEMENT chapter (paragraph+)> <!ELEMENT paragraph (#PCDATA)> <!ATTLIST chapter number CDATA #REQUIRED>]>

A novel consists of a foreword and one or more chapters, in that order Each chapter must have a number attribute

A foreword consists of one or more paragraphs

A chapter also consists of one or more paragraphs

A paragraph consists of parsed character data (text that cannot contain any other elements)

PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded. 

CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.

Page 52: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

ELEMENT descriptions

Suffixes:

? optional foreword?+ one or more chapter+* zero or more appendix*

Separators

, both, in order foreword?, chapter+

| or section|chapter

Grouping

( ) grouping (section|chapter)+

Page 53: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Elements without children

The syntax is <!ELEMENT name category> The name is the element name used in start and end

tags The category may be EMPTY:

In the DTD: <!ELEMENT br EMPTY> In the XML: <br></br> or just <br />

In the XML, an empty element may not have any content between the start tag and the end tag

An empty element may (and usually does) have attributes

Page 54: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Elements with unstructured children

The syntax is <!ELEMENT name category> The category may be ANY

This indicates that any content -- character data, elements, even undeclared elements -- may be used

Since the whole point of using a DTD is to define the structure of a document, ANY should be avoided wherever possible

The category may be (#PCDATA), indicating that only character data may be used

In the DTD: <!ELEMENT paragraph (#PCDATA)> In the XML: <paragraph>A shot rang out!</paragraph> The parentheses are required! Note: In (#PCDATA), whitespace is kept exactly as entered Elements may not be used within parsed character data Entities are character data, and may be used

Page 55: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Elements with children

A category may describe one or more children: <!ELEMENT novel (foreword, chapter+)> Parentheses are required, even if there is only one child A space must precede the opening parenthesis Commas (,) between elements mean that all children must

appear, and must be in the order specified “|” separators means any one child may be used All child elements must themselves be declared Children may have children Parentheses can be used for grouping:

<!ELEMENT novel (foreword, (chapter+|section+))>

Page 56: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Elements with mixed content

#PCDATA describes elements with only character data

#PCDATA can be used in an “or” grouping: <!ELEMENT note (#PCDATA|message)*> This is called mixed content Certain (rather severe) restrictions apply:

#PCDATA must be first The separators must be “|” The group must be starred (meaning zero or more)

Page 57: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Names and namespaces

All names of elements, attributes, and entities, in both the DTD and the XML, are formed as follows: The name must begin with a letter or underscore The name may contain only letters, digits, dots, hyphens,

underscores, and colons

The DTD doesn’t know about namespaces -- as far as it knows, a colon is just part of a name The following are different (and both legal):

<!ELEMENT chapter (paragraph+)> <!ELEMENT myBook:chapter (myBook:paragraph+)>

Avoid colons in names, except to indicate namespaces

Page 58: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

An expanded DTD example

<!DOCTYPE novel [ <!ELEMENT novel (foreword, chapter+, biography?, criticalEssay*)> <!ELEMENT foreword (paragraph+)> <!ELEMENT chapter (section+|paragraph+)> <!ELEMENT section (paragraph+)> <!ELEMENT biography(paragraph+)> <!ELEMENT criticalEssay (section+)> <!ELEMENT paragraph (#PCDATA)>]>

Page 59: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Attributes and entities

In addition to elements, a DTD may declare attributes and entities

An attribute describes information that can be put within the start tag of an element In XML: <car name= "Toyota" model= "2001"></car> In DTD: <!ATTLIST car

name CDATA #REQUIREDmodel CDATA #IMPLIED >

An entity describes text to be substituted In XML: &copyright;

In the DTD: <!ENTITY copyright "Copyright KFUPM">

Page 60: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Attributes

The format of an attribute is:<!ATTLIST element-name

name type requirementname type requirement>

where the name-type-requirement may be repeated as many times as desired Note that only spaces separate the parts, so careful counting is

essential The element-name tells which element may have these

attributes The name is the name of the attribute Each attribute has a type, such as CDATA (character data) Each attribute may be required, optional, or “fixed” In the XML, attributes may occur in any order

Page 61: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Important attribute types

There are ten attribute types

These are the most important ones: CDATA The value is character data (man|woman|child) The value is one from this list ID The value is a unique identifier

ID values must be legal XML names and must be unique within the document

NMTOKEN The value is a legal XML name This is sometimes used to disallow whitespace in the name It also disallows numbers, since an XML name cannot begin with a

digit

The other seven, less frequently used, are: IDREF, IDREFS, NMTOKENS, ENTITY, ENTITIES,

NOTATION, xml:

Page 62: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Requirements

Recall that an attribute has the form<!ATTLIST element-name name type requirement>

The requirement is one of: A default value, enclosed in quotes

Example: <!ATTLIST degree CDATA "PhD"> #REQUIRED

The attribute must be present

#IMPLIED The attribute is optional

#FIXED "value" The attribute always has the given value If specified in the XML, the same value must be used

Page 63: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Entities There are exactly five predefined entities: &lt;, &gt;, &amp;,

&quot;, and &apos;

Additional entities can be defined in the DTD: <!ENTITY copyright "Copyright KFUPM">

Entities can be defined in another document: <!ENTITY copyright SYSTEM "MyURI">

Example of use in the XML: This document is &copyright; 2002.

Entities are a way to include fixed text (sometimes called “boilerplate”)

Entities should not be confused with character references, which are numerical values between & and # Example: &233#; or &xE9#; to indicate the character é

Page 64: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Another example: XML

<?xml version="1.0"?><!DOCTYPE myXmlDoc SYSTEM "http://www.mysite.com/mydoc.dtd"><weatherReport> <date>05/29/2002</date> <location> <city>Philadelphia</city>, <state>PA</state> <country>USA</country> </location> <temperature-range> <high scale="F">84</high> <low scale="F">51</low> </temperature-range></weatherReport>

Page 65: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

The DTD for this example

<!ELEMENT weatherReport (date, location, temperature-range)><!ELEMENT date (#PCDATA)><!ELEMENT location (city, state, country)><!ELEMENT city (#PCDATA)><!ELEMENT state (#PCDATA)><!ELEMENT country (#PCDATA)><!ELEMENT temperature-range ((low, high)|(high, low))><!ELEMENT low (#PCDATA)><!ELEMENT high (#PCDATA)><!ATTLIST low scale (C|F) #REQUIRED> <!ATTLIST high scale (C|F) #REQUIRED>

Page 66: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Inline DTDs

If a DTD is used only by a single XML document, it can be put directly in that document: <?xml version="1.0">

<!DOCTYPE myRootElement [ <!-- DTD content goes here -->]><myRootElement> <!-- XML content goes here --></myRootElement>

An inline DTD can be used only by the document in which it occurs

Page 67: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

External DTDs

An external DTD (a DTD that is a separate document) is declared with a SYSTEM or a PUBLIC command: <!DOCTYPE myRootElement SYSTEM

"http://www.mysite.com/mydoc.dtd"> The name that appears after DOCTYPE (in this example,

myRootElement) must match the name of the XML document’s root element

Use SYSTEM for external DTDs that you define yourself, and use PUBLIC for official, published DTDs

The file extension for an external DTD is .dtd External DTDs can only be referenced with a URL

External DTDs are almost always preferable to inline DTDs, since they can be used by more than one document

Page 68: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Limitations of DTDs

DTDs are a very weak specification language You can’t put any restrictions on element contents It’s difficult to specify:

All the children must occur, but may be in any order This element must occur a certain number of times

There are only ten data types for attribute values

But most of all: DTDs aren’t written in XML! If you want to do any validation, you need one parser for the

XML and another for the DTD This makes XML parsing harder than it needs to be There is a newer and more powerful technology: XML

Schemas However, DTDs are still very much in use

Page 69: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Validators

Opera 5 and Internet Explorer 5 can validate your XML against an internal DTD IE provides (slightly) better error messages Opera apparently just ignores external DTDs IE considers an external DTD to be an error

jEdit with the XML plugin will check for well-structuredness and (if the DTD is inline) will validate your XML each time you do a Save http://www.jedit.org/

Validate [Using Inline DTD] http://www.stg.brown.edu/service/xmlvalid/

Page 70: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

References

W3School DTD Tutorial http://www.w3schools.com/dtd/default.asp

MSXML 4.0 SDK

http://www.topxml.com

http://www.xml.org

http://www.xml.com

Several online presentations

Page 71: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

3 XML Schema Definition (XSD) What is XSD?

An XML Document with Its Schema

Referencing A Schema from XML Document

Simple and Complex Elements

Predefined Types Numeric types Date and Time types String types

Defining Schema Components Simple Elements Attributes Restrictions or Facets Enumeration Complex Elements

Page 72: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

What is XML Schema?

The origin of schema XML Schema documents are used to define and

validate the content and structure of XML data XML Schema was originally proposed by Microsoft,

but became an official W3C recommendation in May 2001 http://www.w3.org/XML/Schema

Page 73: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Why Schema?

InformationStructureFormat

Traditional Document: Everything is clumped together

Information

Structure

Format

“Fashionable” Document: A document is broken into discrete parts, which can be treated separately

Separating Information from Structure and Format

Page 74: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Why Schema?

Schema Workflow

Page 75: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

DTD vs. Schema

DTD XSD

No constraints on character data Can constrain character data like requiring a string to be of a fixed characters

Not using XML syntax Uses XML syntax and thus frees developer of the need to learn another language. XML transformations can be applied, too.

No support for namespace Supports namespaces

Very limited for reusability and extensibility Can reuse in other schemas, create own derived data types and reference multiple schemas from same document

Easier to write DTD-based validators: may only need to check existence of content like PCDATA

Schema-based validators are more difficult to write because we may have to validate content detail

Easier to understand More complex: The notion of “type” adds an extra layer of confusing complexity

Page 76: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XML.org Registry The XML.coverpages.org is a comprehensive, online reference collection

supporting the XML family of markup language standards, XML vocabularies, and related structured information standards.

Page 77: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Example 1: An XML Document Instance

<?xml version="1.0" encoding="utf-8"?>

<book isbn="0836217462">

<title> … </title>

<author> … </author>

<qualification> … </qualification>

</book>  

Page 78: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Schema for Example 1<?xml version="1.0" encoding="utf-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="book">

<xs:complexType>

<xs:sequence>

<xs:element name="title" type="xs:string"/>

<xs:element name="author" type="xs:string"/>

<xs:element name="qualification" type="xs:string"/>

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema> book.xsd

Page 79: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Example 2: An XML Document and Its Schema

<letter> Dear Mr.<name>John Smith</name>. Your order <orderid>1032</orderid> will be shipped on <shipdate>2001-07-13</shipdate>. </letter>

<xs:element name="letter"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="orderid" type="xs:integer"/> <xs:element name="shipdate" type="xs:date"/> </xs:sequence> </xs:complexType> </xs:element>

Page 80: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

The XSD Document

Since the XSD is written in XML, it can get confusing which we are talking about

The file extension is .xsd

The root element is <schema>

The XSD starts like this: <?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

Page 81: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

<schema>

The <schema> element may have attributes: xmlns:xs="http://www.w3.org/2001/XMLSchema"

Indicates that the elements used in the schema (schema, element, complextType, etc) come from this namespace

elementFormDefault="qualified" This means that all XML elements must be qualified (i.e.,

prefixed with xs)

Page 82: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Referring to a Schema To refer to a DTD in an XML document, the reference goes before

the root element: <?xml version="1.0"?>

<!DOCTYPE rootElement SYSTEM "url"><rootElement> ... </rootElement>

To refer to an XML Schema in an XML document, the reference goes in the root element:

<?xml version="1.0"?><rootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="url.xsd"> ...</rootElement>

xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance Schema instance namespace This attribute has two values for

The namespace to use and the second value is the location of the XML schema to use for that namespace:

Page 83: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

“Simple” and “Complex” Elements

A “simple” element is one that contains text and nothing else A simple element cannot have attributes A simple element cannot contain other elements A simple element cannot be empty However, the text can be of many different types,

and may have various restrictions applied to it

If an element isn’t simple, it’s “complex” A complex element may have attributes A complex element may be empty, or it may contain

text, other elements, or both text and other elements

Page 84: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Predefined Numeric Types

Here are some of the predefined numeric types:

Allowable restrictions on numeric types: enumeration, minInclusive, minExclusive,

maxInclusive, maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace

xs:decimal xs:positiveInteger

xs:byte xs:negativeInteger

xs:short xs:nonPositiveInteger

xs:int xs:nonNegativeInteger

xs:long

Page 85: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Predefined Date and Time Types

xs:date - A date in the format CCYY-MM-DD, for example, 2003-11-05

xs:time - A time in the format hh:mm:ss (hours, minutes, seconds)

xs:dateTime - Format is CCYY-MM-DDThh:mm:ss

Allowable restrictions on dates and times: enumeration, minInclusive,

minExclusive, maxInclusive, maxExclusive, pattern, whiteSpace

Page 86: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Predefined String Types

Recall that a simple element is defined as: <xs:element name="name" type="type" />

Here are a few of the possible string types: xs:string - a string xs:normalizedString - a string that doesn’t contain

tabs, newlines, or carriage returns xs:token - a string that doesn’t contain any whitespace other

than single spaces

Allowable restrictions on strings: enumeration, length, maxLength, minLength,

pattern, whiteSpace

Page 87: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Defining a Simple Element

A simple element is defined as <xs:element name="name" type="type" />where: name is the name of the element the most common values for type are

xs:boolean xs:integer xs:date xs:string xs:decimal xs:time

Other attributes a definition of a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be

specified

Page 88: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Defining an Attribute

Attributes themselves are always declared as simple types

An attribute is defined as <xs:attribute name="name" type="type" />where: name and type are the same as for xs:element

Other attributes a definition of a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be specified use="optional" the attribute is not required (default) use="required" the attribute must be present

Page 89: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Restrictions, or “Facets”

The general form for putting a restriction on a text value is: <xs:element name="name"> (or xs:attribute)

<xs:simpleType> <xs:restriction base="type"> ... the restrictions ... </xs:restriction> </xs:simpleType></xs:element>

For example: <xs:element name="age">

<xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="20"/> <xs:maxInclusive value="100"/> </xs:restriction> </xs:simpleType> </xs:element>

Page 90: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Restrictions, or “Facets”

The “age" element is a simple type with a restriction. The acceptable values are: 20 to 100

The example above could also have been written like this:

<xs:element name="age" type="ageType"/><xs:simpleType name="ageType">

<xs:restriction base="xs:integer"> <xs:minInclusive value="20"/> <xs:maxInclusive value="100"/> </xs:restriction>

</xs:simpleType>

Page 91: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Restrictions on numbers

minInclusive number must be ≥ the given value

minExclusive number must be > the given value

maxInclusive number must be ≤ the given value

maxExclusive number must be < the given value

totalDigits number must have exactly value digits

fractionDigits number must have no more than value digits after the decimal point

Page 92: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Restrictions on strings

length the string must contain exactly value characters

minLength the string must contain at least value characters

maxLength the string must contain no more than value characters

pattern the value is a regular expression that the string must match

whiteSpace not really a “restriction” - tells what to do with whitespace value="preserve" Keep all whitespace value="replace" Change all whitespace characters to spaces value="collapse" Remove leading and trailing whitespace, and replace

all sequences of whitespace with a single space

Page 93: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Restriction with Regular Expression Patterns

<xs:element name=“letter"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value=“([a-z])*"/> </xs:restriction> </xs:simpleType></xs:element>

<xs:element name=“password"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value=“[a-zA-Z0-9]{8}"/> </xs:restriction> </xs:simpleType></xs:element>

Test these and find out whether the semantics of regular expressions is the same as that in JavaScript

Page 94: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Enumeration

An enumeration restricts the value to be one of a fixed set of values

Example: <xs:element name="season">

<xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Spring"/> <xs:enumeration value="Summer"/> <xs:enumeration value="Autumn"/> <xs:enumeration value="Fall"/> <xs:enumeration value="Winter"/> </xs:restriction> </xs:simpleType></xs:element>

Page 95: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Complex Elements

A complex element is defined as <xs:element name="name"> <xs:complexType> ... information about the complex type... </xs:complexType> </xs:element>

Example:<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType></xs:element>

Page 96: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Complex Elements

Another example – using a type attribute

<xs:element name="employee" type="personinfo"/><xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/>

<xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>

Page 97: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

xs:sequence

We’ve already seen an example of a complex type whose elements must occur in a specific order:

<xs:element name="person">

<xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName"type="xs:string" /> </xs:sequence></xs:complexType>

</xs:element>

Page 98: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

xs:all xs:all allows elements to appear in any order

<xs:element name="person"> <xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:all> </xs:complexType> </xs:element>

Despite the name, the members of an xs:all group can occur once or not at all

You can use minOccurs="n" and maxOccurs="n" to specify how many times an element may occur (default value is 1) In this context, n may only be 0 or 1

Page 99: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Extensions

You can base a complex type on another complex type

<xs:complexType name="newType"> <xs:complexContent> <xs:extension base="otherType"> ...new stuff... </xs:extension> </xs:complexContent></xs:complexType>

Page 100: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Text Element with Attributes

If a text element has attributes, it is no longer a simple type<xs:element name="population">

<xs:complexType> <xs:simpleContent> <xs:extension base="xs:integer"> <xs:attribute name="year"

type="xs:integer"> </xs:extension> </xs:simpleContent> </xs:complexType>

</xs:element>

Page 101: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Empty Elements

Empty elements are (ridiculously) complex

<xs:complexType name="counter"> <xs:complexContent> <xs:extension base="xs:integer"/> <xs:attribute name="count"

type="xs:integer"/> </xs:complexContent></xs:complexType>

Page 102: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Mixed Elements Mixed elements may contain both text and elements

We add mixed="true" to the xs:complexType element

The text itself is not mentioned in the element, and may go anywhere (it is basically ignored)

<xs:complexType name="paragraph" mixed="true"> <xs:sequence> <xs:element name="someName"

type="xs:anyType"/> </xs:sequence></xs:complexType>

See Example 2 at the start of this section

Page 103: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

References

W3School XSD Tutorial http://www.w3schools.com/schema/default.asp

MSXML 4.0 SDK

Several online presentations

Page 104: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

4 XPath

What is XPath?

Sample Syntactic Elements Path Slashes Brackets Stars

Arithmetic Expressions

Some XPath Functions

Page 105: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

What is XPath?

XPath is a syntax used for selecting parts of an XML document

The way XPath describes paths to elements is similar to the way an operating system describes paths to files

XPath is almost a small programming language; it has functions, tests, and expressions

XPath is a W3C standard http://www.w3.org/TR/xpath

XPath is not itself written as XML, but is used heavily in XSLT

Page 106: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Page 107: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Terminology

<library> <book>

<chapter> </chapter>

<chapter> <section> <paragraph/> <paragraph/> </section> </chapter>

</book></library>

library is the parent of book; book is the

parent of the two chapters

The two chapters are the children of book,

and the section is the child of the second

chapter

The two chapters of the book are siblings

(they have the same parent)

library, book, and the second chapter are

the ancestors of the section

The two chapters, the section, and the

two paragraphs are the descendents of the

book

Page 108: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Paths

/library = the root element (if named library )

Operating system: XPath:

/ = the root directory

/users/dave/foo = the file named foo in dave in users

/library/book/chapter/section = every section element in a chapter in every book in the library

. = the current directory . = the current element

.. = the parent directory .. = parent of the current element

/users/dave/* = all the files in /users/dave

/library/book/chapter/* = all the elements in /library/book/chapter

foo = the file named foo in the current directory

section = every section element that is a child of the current element

Page 109: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Slashes A path that begins with a / represents an absolute path, starting

from the top of the document Example: /email/message/header/from Note that even an absolute path can select more than one element A slash by itself means “the whole document”

A path that does not begin with a / represents a path starting from the current element Example: header/from

A path that begins with // can start from anywhere in the document Example: //header/from selects every element from that is a

child of an element header This can be expensive, since it involves searching the entire document

Page 110: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Brackets and last()

A number in brackets selects a particular matching child Example: /library/book[1] selects the first book of the

library Example: //chapter/section[2] selects the second section

of every chapter in the XML document Example: //book/chapter[1]/section[2] Only matching elements are counted; for example, if a book

has both sections and exercises, the latter are ignored when counting sections

The function last() in brackets selects the last matching child Example: /library/book/chapter[last()]

You can even do simple arithmetic Example: /library/book/chapter[last()-1]

Page 111: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Stars

A star, or asterisk, is a “wild card”--it means “all the elements at this level” Example: /library/book/chapter/* selects every

child of every chapter of every book in the library Example: //book/* selects every child of every book

(chapters, tableOfContents, index, etc.) Example: /*/*/*/paragraph selects every

paragraph that has exactly three ancestors Example: //* selects every element in the entire

document

Page 112: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Attributes I

You can select attributes by themselves, or elements that have certain attributes Remember: an attribute consists of a name-value pair, for

example in <chapter num="5">, the attribute is named num

To choose the attribute itself, prefix the name with @ Example: @num will choose every attribute named num Example: //@* will choose every attribute, everywhere in the

document

To choose elements that have a given attribute, put the attribute name in square brackets Example: //chapter[@num] will select every chapter

element (anywhere in the document) that has an attribute named num

Page 113: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Attributes II

//chapter[@num] selects every chapter element with an attribute num

//chapter[not(@num)] selects every chapter element that does not have a num attribute

//chapter[@*] selects every chapter element that has any attribute

//chapter[not(@*)] selects every chapter element with no attributes

Page 114: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Values of attributes

//chapter[@num='3'] selects every chapter element with an attribute num with value 3

The normalize-space() function can be used to remove leading and trailing spaces from a value before comparison Example: //chapter[normalize-

space(@num)="3"]

Page 115: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Arithmetic Expressions

+ add

- subtract

* multiply

div (not /) divide

mod modulo (remainder)

Page 116: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Equality Tests

= “equals” (Notice it’s not ==)

!= “not equals”

But it’s not that simple!

value = node-set will be true if the node-set contains

any node with a value that matches value

value != node-set will be true if the node-set

contains any node with a value that does not match value

Hence,

value = node-set and value != node-set may both be true at the same time!

Page 117: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Other Boolean Operators

and (infix operator)

or (infix operator)

Example: count = 0 or count = 1

not() (function)

The following are used for numerical comparisons only: < “less than” <= “less than or equal to” > “greater than” >= “greater than or equal to”

Page 118: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Some XPath Functions

XPath contains a number of functions on node sets, numbers, and strings; here are a few of them: count(elem) counts the number of selected elements

Example: //chapter[count(section)=1] selects chapters with exactly one section child

name() returns the name of the element Example: //*[name()='section'] is the same as //section

starts-with(arg1, arg2) tests if arg1 starts with arg2 Example: //*[starts-with(name(), 'sec')]

contains(arg1, arg2) tests if arg1 contains arg2 Example: //*[contains(name(), 'ect')]

Examples http://www.zvon.org/xxl/XPathTutorial/General/examples.html

Page 119: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

References

W3School XPath Tutorial http://www.w3schools.com/xpath/default.asp

MSXML 4.0 SDK

Several online presentations

Page 120: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

5 XSL / XSLT What is XSL?

Some XSLT Constructs xsl:value-of xsl:for-each xsl:if xsl:choose xsl:sort xsl:text xsl:attribute

Templates

XSL on the Client

XSL on the Server

Page 121: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

What is XSL? XSL stands for eXtensible Stylesheet Language

a standard recommended by the W3C http://www.w3.org/TR/xsl/

CSS was designed for styling HTML pages, and can be used to style XML pages

XSL was designed specifically to style XML pages, and is much more sophisticated than CSS

XSL consists of three languages: XSLT (XSL Transformations) is a language used to transform XML documents

into other kinds of documents (most commonly HTML, so they can be displayed)

XPath is a language to select parts of an XML document to transform with XSLT

XSL-FO (XSL Formatting Objects) is a replacement for CSS The future of XSL-FO as a standard is uncertain, because much of its functionality

overlaps with that provided by cascading style sheets (CSS) and the HTML tag set

Page 122: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

How does it work?

The XML source document is parsed into an XML source tree

You use XPath to define templates that match parts of the source tree

You use XSLT to transform the matched part and put the transformed information into the result tree

The result tree is output as a result document

Parts of the source document that are not matched by a template are typically copied unchanged

Page 123: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Simple XPath

Here’s a simple XML document:

<?xml version="1.0"?><library> <book> <title>XML</title> <author>Gregory Brill</author> </book> <book> <title>Java and XML</title> <author>Brett Scott</author> </book></library >

XPath expressions look a lot like paths in a computer file system / means the document

itself (but no specific elements)

/library selects the root element

/library/book selects every book element

//author selects every author element, wherever it occurs

Page 124: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Simple XSLT

<xsl:for-each select="//book"> loops through every book element, everywhere in the document

<xsl:value-of select="title"/> chooses the content of the title element at the current location

<xsl:for-each select="//book"> <xsl:value-of select="title"/></xsl:for-each>chooses the content of the title element for each book in the XML document

Page 125: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Using XSL to Create HTML

Our goal is to turn this:

<?xml version="1.0"?><library> <book> <title>XML</title> <author>Gregory Brill</author> </book> <book> <title>Java and XML</title> <author>Brett Scott</author> </book></library >

Into HTML that displays something like this:

Book Titles: • XML • Java and XMLBook Authors: • Gregory Brill • Brett Scott

Note that we’ve grouped titles and authors separately

Page 126: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

What we need to do

We need to save our XML into a file (let’s call it books.xml)

We need to create a file (say, books.xsl) that describes how to select elements from books.xml and embed them into an HTML page We do this by intermixing the HTML and the XSL in

the books.xsl file

We need to add a line to our books.xml file to tell it to refer to books.xsl for formatting information

Page 127: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

books.xml, revised

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="books.xsl"?><library> <book> <title>XML</title> <author>Gregory Brill</author> </book> <book> <title>Java and XML</title> <author>Brett McLaughlin</author> </book></library >

This tells you whereto find the XSL file

Page 128: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Desired HTML

<html> <head> <title>Book Titles and Authors</title> </head> <body> <h2>Book titles:</h2> <ul> <li>XML</li> <li>Java and XML</li> </ul> <h2>Book authors:</h2> <ul> <li>Gregory Brill</li> <li>Brett Scott</li> </ul> </body></html>

Red text is data extracted from the XML document

Blue text is our HTML template

We don’t necessarily know how much data

we will have

Page 129: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XSL Outline

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">

<html> ... </html>

</xsl:template>

</xsl:stylesheet>

Page 130: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Selecting Titles and Authors

<h2>Book titles:</h2> <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="title"/> </li> </xsl:for-each> </ul><h2>Book authors:</h2> ...same thing, replacing title with author

Notice that XSL can rearrange the data; the HTML result can present information in a different order than the XML

Notice the xsl:for-

each loop

Page 131: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

All of books.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="books.xsl"?><library> <book> <title>XML</title> <author>Gregory Brill</author> </book> <book> <title>Java and XML</title> <author>Brett Scott</author> </book></library >

Note: if you do View Source, this is what you will see, not the resultant HTML

Page 132: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

All of books.xsl

<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/ XSL/Transform"><xsl:template match="/"><html> <head> <title>Book Titles and Authors</title> </head> <body> <h2>Book titles:</h2> <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="title"/> </li> </xsl:for-each> </ul>

<h2>Book authors:</h2> <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="author"/> </li> </xsl:for-each> </ul> </body></html></xsl:template></xsl:stylesheet>

Page 133: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

How to use it

In a modern browser, such as Netscape 6, Internet Explorer 6, or Mozilla 1.0, you can just open the XML file Older browsers will ignore the XSL and just show

you the XML contents as continuous text

You can use a program such as Xalan, MSXML, or Saxon to create the HTML as a file This can be done on the server side, so that all the

client side browser sees is plain HTML The server can create the HTML dynamically from

the information currently in XML

Page 134: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

The result (in IE)

Page 135: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XSLT

XSLT stands for eXtensible Stylesheet Language Transformations

XSLT is used to transform XML documents into other kinds of documents--usually, but not necessarily, XHTML

XSLT uses two input files: The XML document containing the actual data The XSL document containing both the “framework”

in which to insert the data, and XSLT commands to do so

Page 136: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Understanding the XSLT Process

Page 137: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

The XSLT Processor

Page 138: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

The .xsl file

An XSLT document has the .xsl extension

The XSLT document begins with: <?xml version="1.0"?> <xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

Contains one or more templates, such as: <xsl:template match="/"> ... </xsl:template>

And ends with: </xsl:stylesheet>

The template <xsl:template match="/"> says select the entire file You can think of this as selecting the root node of the XML tree

Page 139: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Where XSLT can be used

A server can use XSLT to change XML files into HTML files before sending them to the client

A modern browser can use XSLT to change XML into HTML on the client side This is what we will mostly be doing here

Most users seldom update their browsers If you want “everyone” to see your pages, do any

XSL processing on the server side Otherwise, think about what best fits your situation

Page 140: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

xsl:value-of

<xsl:value-of select="XPath expression"/> selects the contents of an element and adds it to the output stream The select attribute is required Notice that xsl:value-of is not a container tag,

hence it needs to end with a slash

Page 141: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

xsl:for-each

xsl:for-each is a kind of loop statement

The syntax is <xsl:for-each select="XPath expression"> Text to insert and rules to apply </xsl:for-each>

Example: to select every book (//book) and make an unordered list (<ul>) of their titles (title), use: <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="title"/> </li> </xsl:for-each> </ul>

Page 142: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Filtering Output

You can filter (restrict) output by adding a criterion to the select attribute’s value: <ul> <xsl:for-each select="//book"> <li> <xsl:value-of select="title[../author='Brett Scott']"/> </li> </xsl:for-each> </ul>

This will select book titles by Brett Scott

Page 143: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Filter Details

Here is the filter we just used: <xsl:value-of select="title[../author='Brett Scott']"/>

author is a sibling of title, so from title we have to go up to its parent, book, then back down to author

This filter requires a quote within a quote, so we need both single quotes and double quotes

Legal filter operators are: = != &lt; &gt; Numbers should be quoted

Page 144: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

But it doesn’t work right!

Here’s what we did: <xsl:for-each select="//book"> <li> <xsl:value-of select="title[../author='Brett Scott']"/> </li> </xsl:for-each>

This will output <li> and </li> for every book, so we will get empty bullets for authors other than Brett Scott

There is no obvious way to solve this with just xsl:value-of

Page 145: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

xsl:if

xsl:if allows us to include content if a given condition (in the test attribute) is true

Example: <xsl:for-each select="//book"> <xsl:if test="author='Brett Scott'"> <li> <xsl:value-of select="title"/> </li> </xsl:if> </xsl:for-each>

This does work correctly!

Page 146: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

xsl:choose

The xsl:choose ... xsl:when ... xsl:otherwise construct is XML’s equivalent of Java’s switch ... case ... default statement

The syntax is:<xsl:choose> <xsl:when test="some condition"> ... some code ... </xsl:when> <xsl:otherwise> ... some code ... </xsl:otherwise></xsl:choose>

xsl:choose is often used within anxsl:for-each loop

Page 147: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

xsl:sort

You can place an xsl:sort inside an xsl:for-each

The attribute of the sort tells what field to sort on

Example: <ul> <xsl:for-each select="//book"> <xsl:sort select="author"/> <li> <xsl:value-of select="title"/> by <xsl:value-of select="author"> </li> </xsl:for-each> </ul>

This example creates a list of titles and authors, sorted by author

Page 148: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

xsl:text Used inside templates to indicate that its contents should be output as text

Its contents are pure text, not elements, and white space is not collapsed

<xsl:text>...</xsl:text> helps deal with two common problems: XSL isn’t very careful with whitespace in the document

This doesn’t matter much for HTML, which collapses all whitespace anyway <xsl:text> gives you much better control over whitespace; it acts like the <pre> element in HTML

Since XML defines only five entities, you cannot readily put other entities (such as &nbsp;) in your XSL

These are &amp; (&), &lt; (<), &gt; (>), &quot; (“), &apos; (‘) Others can be inserted using their decimal or hexadecimal number forms You may use the following secret formula for entities:

<xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text>

• A “yes” value means special characters like “<“ should be output as is. “no” indicates that “<“ should be output as “&lt;”. Default is “no”

Page 149: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Creating Tags from XML Data

Suppose the XML contains<name>Dr. Scott's Home Page</name><url>http://www.kfupm.edu/~scott</url>

And you want to turn this into<a href="http://www.kfupm.edu/~scott">Dr. Scott's Home Page</a>

We need additional tools to do this It doesn’t even help if the XML directly contains

<a href="http://www.kfupm.edu/~scott">Dr. Scott's Home Page</a> -- we still can’t move it to the output

The same problem occurs with images in the XML

A reason for the above is that attribute fields may not contain reserved characters like < and > in XML

Page 150: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Creating Tags - solution 1 Suppose the XML contains

<name>Dr. Scott's Home Page</name> <url>http://www.kfupm.edu/~scott</url>

<xsl:attribute name="..."> adds the named attribute to the enclosing tag

The value of the attribute is the content of this tag

Example: <a>

<xsl:attribute name="href"> <xsl:value-of select="url"/> </xsl:attribute> <xsl:value-of select="name"/> </a>

Result: <a href="http://www.kfupm.edu/~scott">Dr. Scott's Home Page</a>

Page 151: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Creating Tags - solution 2

Suppose the XML contains <name>Dr. Scott's Home Page</name> <url>http://www.kfupm.edu/~scott</url>

An attribute value template (AVT) consists of braces { } inside the attribute value

The content of the braces is replaced by its value

Example: <a href="{url}">

<xsl:value-of select="name"/> </a>

Result: <a href="http://www.kfupm.edu/~scott"> Dr. Scott's Home Page</a>

Page 152: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Modularization

Modularization: breaking up a complex program into simpler parts (is an important programming tool) In programming languages modularization is often done with

functions or methods In XSL we can do something similar with

xsl:apply-templates

For example, suppose we have a DTD for book with parts titlePage, tableOfContents, chapter, and index We can create separate templates for each of these parts

Template rules are used to control what output is created from what input

Page 153: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

…Modularization A template rule is represented by an <xsl:template>

element

The <xsl:template> element has A match attribute that contains an XPath pattern identifying the

input it matches A template that is instantiated and output when the pattern is

matched

Template skeleton: <xsl:template match=“person”> A Person </xsl:template>

The above says that every time a <person> element is seen, the stylesheet processor should emit the text “A Person”

Page 154: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Book example

<xsl:template match="/"> <html> <body> <xsl:apply-templates/> </body> </html></xsl:template>

<xsl:template match="tableOfContents"> <h1>Table of Contents</h1> <xsl:apply-templates select="chapterNumber"/>

<xsl:apply-templates select="chapterName"/> <xsl:apply-templates select="pageNumber"/></xsl:template>

Etc.

Page 155: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

xsl:apply-templates

The <xsl:apply-templates> element applies a template rule to the current element or to the current element’s child nodes

If we add a select attribute, it applies the template rule only to the child that matches

If we have multiple <xsl:apply-templates> elements with select attributes, the child nodes are processed in the same order as the <xsl:apply-templates> elements

Page 156: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

When templates are ignored

Templates aren’t used unless they are applied Exception: Processing always starts with

select="/" If it didn’t, nothing would ever happen

If your templates are ignored, you probably forgot to apply them

If you apply a template to an element that has child elements, templates are not automatically applied to those child elements

Page 157: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Applying templates to children <book>

<title>XML</title> <author>Gregory Brill</author> </book>

<xsl:template match="/"> <html> <head></head> <body> <b><xsl:value-of select="/book/title"/></b> <xsl:apply-templates select="/book/author"/> </body> </html></xsl:template>

<xsl:template match="/book/author"> by <i><xsl:value-of select="."/></i></xsl:template>

With this line:XML by Gregory Brill

Without this line:XML

Page 158: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Built-in Templates XSLT has a couple of built in templates, which say:

when you apply templates to an element, process its child elements when you apply templates to a text node, give its value

Together, it means that if you apply templates to an element but don't have an explicit template for that element, then its content gets processed and eventually you end up with the text that the element contains.

Here are the built-in template rules for each of the seven XPath node types:

Elements Apply templates to children

Text Copy text to the result tree

Comments Do nothing

PIs Do nothing

Attributes Copy the value of the attribute to the result tree

Name spaces Do nothing

Root Apply templates to children

Page 159: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XSL - On the Client

If your browser supports XML, XSL can be used to transform the document to XHTML in your browser Even if this works fine, it is not always desirable to include a style

sheet reference in an XML file (i.e. it will not work in a non XSL aware browser.)

A JavaScript Solution A more versatile solution would be to use a JavaScript to do the XML

to XHTML transformation

By using JavaScript, we can: do browser-specific testing use different style sheets according to browser and user needs

XSL transformation on the client side is bound to be a major part of the browsers work tasks in the future, as we will see a growth in the specialized browser market (Braille, aural browsers, Web printers, handheld devices, etc.)

Page 160: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Transforming XML to XHTML in Your Browser

<html> <body><script type="text/javascript">

// Load XML var xml = new ActiveXObject("Microsoft.XMLDOM") xml.async = false xml.load(“books.xml")

// Load XSL var xsl = new ActiveXObject("Microsoft.XMLDOM") xsl.async = false xsl.load(“books.xsl")

// Transform document.write(xml.transformNode(xsl))

</script> </body> </html>

Page 161: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

XSL - On the Server

Since not all browsers support XML and XSL, one solution is to transform the XML to XHTML on the server

To make XML data available to all kinds of browsers, we have to transform the XML document on the SERVER and send it as pure XHTML to the BROWSER

That's another beauty of XSL! One of the design goals for XSL was to make it possible to transform data from one format to another on a server, returning readable data to all kinds of future browsers

Page 162: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

Thoughts on XSL

XSL is a programming language--and not a particularly simple one Expect to spend considerable time debugging your XSL

These slides have been an introduction to XSL andXSLT--there’s a lot more of it we haven’t covered

As with any programming, it’s a good idea to start simple and build it up incrementally: “Write a little, test a little” This is especially a good idea for XSLT, because you don’t get

a lot of feedback about what went wrong

Try jEdit with the XML plugin write (or change) a line or two, check for syntax errors, then

jump to IE and reload the XML file

Page 163: COSC 843: Application Development for Internet Based Services Data Description and Transformation 1. XML 2. DTD 3. XSD 4. Xpath 5. XSL /XSLT

COSC 843: Application Development for Internet Based Services

References

W3School XSL Tutorial http://www.w3schools.com/xsl/default.asp

MSXML 4.0 SDK

http://www.topxml.com

http://www.xml.org

http://www.xml.com

Several online presentations