84
© Michael Sonntag 2004 Basics XML, Namespaces, DTD XML Techniques for E-Commerce, Budapest 2004 Mag. iur. Dr. techn. Michael Sonntag Institute for Information Processing and Microprocessor Technology (FIM) Johannes Kepler University Linz, Austria E-Mail: [email protected] http://www.fim.uni-linz.ac.at/staff/sonntag.htm

XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

© Michael Sonntag 2004

BasicsXML, Namespaces, DTD

XML Techniques for E-Commerce, Budapest 2004

Mag. iur. Dr. techn. Michael Sonntag

Institute for Information Processing andMicroprocessor Technology (FIM)

Johannes Kepler University Linz, AustriaE-Mail: [email protected]://www.fim.uni-linz.ac.at/staff/sonntag.htm

Page 2: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

© Michael Sonntag 2004

QuestionsQuestions??Please ask immediately!

? ?

??

??

Page 3: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 3XML Techniques for E-Commerce: Basics

Content

IntroductionXML: Structure and principlesThe problems of XMLXML

StructureElementsAttributesOther concepts

DTDsWhat is it and why is it not enough?Defining content with DTD

Short but important: Namespaces

Page 4: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 4XML Techniques for E-Commerce: Basics

Introduction

What is this record?

1000101101101101011001000110101100101100111111000110110010000001000011101101110100110101010011010100111010010011010110011011101101111010101101101000000111001111

This is an order from a customer for a newLaptop and an external firewire harddisk.

Isn't this obvious???

Page 5: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 5XML Techniques for E-Commerce: Basics

Introduction

1000101101101101011001000110101100101100111111000110110010000001000011101101110100110101010011010100111010010011010110011011101101111010101101101000000111001111

Order number Ordering date

1 piece

Firewire harddiskLaptop

Customer name

Delivery dateUsage by recipient:

if(orderBytes[15]&0x3f==0x17) ...String customerName=extractBytes(orderBytes,4,20);...

Page 6: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 6XML Techniques for E-Commerce: Basics

Introduction

This is NOT a very good design!Where is the description of the record?How to write the parser to extract all the data?What if we want to add a "comment" field?...

Better:<Order>

<OrderNr>139</OrderNr><OrderingDate>11.11.2003</OrderingDate><CustomerName>Michael Sonntag</CustomerName><Items>

<Item count="1" Nr="0815">Laptop</Item><Item count="1" Nr="4711">Firewire harddisk (External)</Item>

</Items><DeliveryDate>2.1.2004, 10:30</DeliveryDate>

</Order>

Page 7: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 7XML Techniques for E-Commerce: Basics

Reasons & Goals for XML (1)

Format for storage of dataIndependent from presentation (XSLT for this)

ExtensibilityTo accommodate all the different needs without formal process

Exact definitionStrict set of rules without alternatives for the same, but stillcompletePossibility to check correctness on different levels

» E. g. without knowing about the contentPlatform independence

Very simple basic character set (others optional; see below)Everything is text (no binary data which can lead to problems)

Full character setXML supports Unicode

Page 8: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 8XML Techniques for E-Commerce: Basics

Reasons & Goals for XML (2)

Open definitionNot proprietary (would allow revocation/changes at any time)Can be standardized

SimplicityShould be easy to understand

» Partly document can be used even without knowing the definition» Human readable: No special tools needed for basic information

Easier to implementApplication independence

Not only for webpages but also for any applicationTerseness unimportant

Rely on text compression programs for this

Page 9: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 9XML Techniques for E-Commerce: Basics

Reasons & Goals for XML (3)

History: Binary formatsPro: Very small, rather simple to createCon: Unintelligible without proper format, complicated to parse, special parser for every message/format, etc., works for only one type of computer (or large difficulties)

XML: Textual formatONE parser for all documentsDescription of structure can be contained

» Or can be easily exchangedAutomatic checking whether document conforms to structureHuman readable, easily understandableCompletely portable (systems, character sets, etc.)

Page 10: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 10XML Techniques for E-Commerce: Basics

XML vs. HTML

HTML: Describes the visual representationXML: Describes the structure of the dataExample: A poem

HTML: Linebreak defines when a new line on the screen starts» Might be off if the line is long and the screen small

What's in a name? that which we call a rose<br>By any other name would smell as sweet;<br>

XML: Linebreak defines when one line of the poem ends» Whether a single line of the poem is printed on one or two lines, the

second perhaps indented, … is NOT specified!<line>What's in a name? that which we call a rose</line><line>By any other name would smell as sweet;</line>

http://the-tech.mit.edu/Shakespeare/romeo_juliet/romeo_juliet.2.2.html

Page 11: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 11XML Techniques for E-Commerce: Basics

What XML is not

No programming languageXML is for data; programs can be stored as data, but not executed

No successor/replacement of HTMLXML + ..... (e. g. stylesheets) could replace HTMLThis is often, but not always a good idea!

No databaseFile format: A lot of data can be stored, but there is e. g. no (efficient) query language

» XPath and XQuery work, but currently performace nowhere near SQLEfficiency as a database is very bad (slow, no transactions, ...)

Not reserved for special applications ("Only" Web, EDI, ...)Universal, can be used everywhere

No "jack of all trades"Suitable for many applications, but obviously not for all!

Page 12: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 12XML Techniques for E-Commerce: Basics

Of the origin of XML (1)

(Grand)parent: SGML (ISO 8879; 15.10.1986)Rather complicated; easy implementation was NOT a design goal!

Initial idea: SGML Editorial Review Board (1996)Participation from the SGML Working GroupAll in the scope of the W3C

Name changed to XML Working GroupParticipation from XML Special Interest Group

Now: XML Core Working GroupFirst Version: 10.2.1998 (W3C Recommendation)

Extensible Markup Language (XML) 1.0Second Version: 6.10.2000 (W3C Recommendation)

Extensible Markup Language (XML) 1.0 (Second Edition)» Not a new version, just includes all the errata since 1998!

GML (1969)

SGML (1985)

HTML (1993)

XML (1998)

Page 13: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 13XML Techniques for E-Commerce: Basics

Of the origin of XML (2)

Third Version: 4.2.2004 (W3C Recommendation)Extensible Markup Language (XML) 1.0 (Third Edition)

» Not a new version, just includes all the errata since 2000!New Version: 4.2.2004 (W3C Recommendation)

Extensible Markup Language (XML) 1.1Official encouragement to use 1.0, if new features are not required!Changes/new features:

» New Unicode characters can now also be used in names– In content text already possible!

» Names are more "loose" everything not forbidden is allowed– 1.0: Everything not allowed is forbidden

» Additional line termination characters (important for mainframes only)– XML files are then plain text (instead of binaries) also on these computers

» Normalization: Allows binary comparison even for Unicode characters– Uses the "Unicode Normalization Form C"

Page 14: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 14XML Techniques for E-Commerce: Basics

Some XML technologies

XML

HTML

XML

FOXML

Schema

XMLName-space

XSLT

ebXML, SOAP,

SecurityMetadata,

...

XPath

Java

Page 15: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 15XML Techniques for E-Commerce: Basics

Vital

Some XML technologies

XML

HTML

XML

FOXML

Schema

XMLName-space

XSLT

ebXML, SOAP,

SecurityMetadata,

...

XPath

Java

Optional

Page 16: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 16XML Techniques for E-Commerce: Basics

Basic XML

XML

HTML

XML

FOXML

Schema

XMLName-space

XSLT

ebXML, SOAP,

SecurityMetadata,

...

XPath

Java

Page 17: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 17XML Techniques for E-Commerce: Basics

Structure of XML:Elements

A "tag" is a name within angular brackets (“<”,”>”)Each "element" consists of a start and an end tag

Empty elements are “fused” together (modified end tag alone)Between start and end tag there may be some contentStart or empty tags may contain attributes (see later)

Restrictions for tag names:Case-sensitive (Unicode character number, glyphs don’t matter)May not start with xml (or XML, xML, xMl, …)Name must start with letter, “_” or “:”Within a name: Letter, Digit, “. -_:” + some other Unicode chars

» If namespaces are used, “:” is NOT allowed anymore!Examples:

<address> … </address>; <surname> … </surname><address/>

Page 18: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 18XML Techniques for E-Commerce: Basics

Structure of XML:Well-formedness

Name of start tag must exactly match name of end tagNo “interleaving”

<a>…<b>…</a>…</b>: INVALID!

<a>…<b>…</b>…</a>: VALID!

<a>…<b/>…</a>: VALID!

At the top level there may be only a single elementThe “document element” (its tag name is irrelevant, however!)

Any attribute may occur only once in a single tag+ Several rules for entities and entity references

See specification for details!

<a>…</a>…<b>…</b>: VALID!

Page 19: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 19XML Techniques for E-Commerce: Basics

Structure of XML:Attributes

Attribute: name “=“ valueValue MUST be quoted

Contrast to HTML!Quotes either single or double

» “…” or ‘…’Restrictions:

No attributes without a value allowed» Value can be the empty string

Must be in start tag or empty tagOrder within the element is unimportantExamples:

<ring OwnerID=“#1”>,<Time daylight=“”><cake type=“honey” expires=“11.11.2002”>

Page 20: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 20XML Techniques for E-Commerce: Basics

General structure / Data model

Hierarchical ordering of elements in a treeOnly a single root node (tree, not a forest)

Attributes for each element possibleText cannot contain attributes!

Elements and attributes are both "nodes"Each element may possess (OR)

(Child) Elements

Textual content

Good design:Child elements XOR Text!(One or the other, but not both!) Bad!

Page 21: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 21XML Techniques for E-Commerce: Basics

Structure of XML:General layout

There may be a XML declaration at the start of the fileExample: <?xml version="1.0"?>

May be followed by a reference to a DTD (see later)Example: <!DOCTYPE order SYSTEM ”Frying_Pan_Orders.dtd">

After that the single document element followsAt the end only PI’s, comments and whitespaces may follow

Examples (Each a complete XML file):<?xml version="1.0"?> <location>The Waters</location><CompanyCount>12</ CompanyCount >

Page 22: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 22XML Techniques for E-Commerce: Basics

Structure of XML: Example

<?xml version="1.0" encoding="UTF-8"?><message>

<Recipient><email>[email protected]</email>

</Recipient><Subject>Ãœbungsabgabe</Subject><Sender>

<email>[email protected]</email></Sender><CC>

<email>[email protected]</email></CC><BCC/><Body>Bis wann ist die neue Ãœbung abzugeben?</Body>

</message>Message.xml

Unicode encoding: ASCII is 8 bits, special characters are 16 bits

Page 23: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 23XML Techniques for E-Commerce: Basics

Structure of XML: Example

Creating a well-formed XML file according to the following specification

The file should contain your contact informationYout student number as well as other study-related information should be contained

Important: Think about the structure before writing!How should it be split up?

» E. g. street + house number together or two separate elements?What is an element, what an attribute?

» Is the ZIP code an attribute of the city or a separate element?Which order/containment?

» E. g. Split up the "address" into separate parts below it or are they child of the document element?

Persondata.xml

Page 24: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 24XML Techniques for E-Commerce: Basics

Structure of XML:Characters

Within and between tags there may be textText consists of any Unicode characters

Small exception: Surrogate blocks, 0xFFFE (BOM) and 0xFFFFDepends on the definition what it means

Usually characters only within tags» Example: <a><b>chars1</b><b>chars2</b></a>» But not: <a><b>chars1</b>chars3<b>chars2</b></a> (Although this is

allowed by the specification!)May never contain <, > or &

» Must be represented as “&lt;”, “&gt;” and “&amp;” (or numerically)– No other special characters defined by default (unlike HTML!)

Whitespaces: Space, carriage return, line feed, tabSpecial processing possible (i. e. automatically removed)

Page 25: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 25XML Techniques for E-Commerce: Basics

Characters: Examples

E-Mail address:&quot;Michael Sonntag&quot; &lt;[email protected]&gt;

» "Michael Sonntag" <[email protected]>» Angular brackets cannot be contained in XML text content!» The quotes could be contained, unless this is an attribute!

Copyright notice:&copy; 2003 by Michael Sonntag for FIM

» Web Browsers do know about these by default and can display them!© 2003 by Michael Sonntag for FIM

» Difficult to insert manually unless there is an Unicode editor with a character table for graphically picking characters available!

Company name:Acme GesmbH &amp; Co KGResults in: "Acme GesmbH & Co KG"

Page 26: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 26XML Techniques for E-Commerce: Basics

Characters: Example

List of commonly used entities:http://www.w3.org/TR/html401/sgml/entities.htmlAlso understood by browsers!

Should this encoding be done extensively (wherever possible), or only if there is no other chance?

Extensively:» Text gets harder to read» References require no special editor» Makes you think whether this is really needed

Sparse:» Text can be read everywhere at least to some degree» No table of entities needed» Files are shorter (a bit)

Page 27: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 27XML Techniques for E-Commerce: Basics

Structure of XML:Whitespace handling

Spaces, tabs, blank lines: WhitespacesOften useful for visual layout (especially indentation)Usually unimportant for content / not intended for actual handlingSometimes however important

» Poetry, source code, encoded content, ...All characters must be passed to the application

This includes the whitespaces mentioned aboveValidating processor: Informing that they are whitespace obligatoryRegardless of the value of xml:space (see next slide)!

Page 28: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 28XML Techniques for E-Commerce: Basics

Structure of XML:Whitespace handling

Special attribute as signal for application: "xml:space"Applies to the element it is attribute of and all contained elements, unless specified there separatels

Can be used always and everywhereFor valid documents it must be declared, however (see later)!

Values: "default" (default value) and "preserve"Applies to all contained elements unless overriden

Default: The application should ignore these whitespacesPreserve: The application should consider whitespaces as important and preserve them

Page 29: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 29XML Techniques for E-Commerce: Basics

Structure of XML:Linebreak handling

To ease use, all linebreaks in special parts must be unified&#xD;&#xA; (CR LF) and &#xD; (CR) are converted to &#xA; (LF)

» XML 1.1: Additionally &#xD85;, &#x85; and &#x2028; are converted– Special linebreaks from Unicode

Happens before parsing: Application sees only &#xA; ('\n')Applies only to "external parsed entities" and the XML-file itself(the document entity), nothing else

See later: Entities (DTD)!This is almost all content

Except those encoded explicitly» E. g. by using '&#xD;&#xA;'

– This will stay the same and not be unified!

Page 30: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 30XML Techniques for E-Commerce: Basics

Structure of XML:Language identification

Special attribute for identifying the natural language used: "xml:lang"

Applies to all attributes and all subelements of the element it is defined in (unless below there is a new declaration!)

Can be used always and everywhereFor valid documents it must be declared, however (see later)!

Values are according to RFC 1766"en", "en-us", "en-uk", "de", "de-at", "de-de", ...

Examples:<p xml:lang="en-us">of golden color</p><p xml:lang="en-uk">of golden colour</p><title xml:lang="de" desc="short">Hin und wieder zurück</title>

Page 31: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 31XML Techniques for E-Commerce: Basics

Structure of XML:CDATA sections

Allows including any text within an XML fileStart: “<![CDATA[“, End: “]]>”

Allowed in between: Anything excluding “]]>” (the end marker)» No nesting of CDATA sections possible!

Tags within are treated as simple text, not as tags!During parsing the start and end marking is removed and only thecontained data is returned as textNo nesting allowed (ending may not appear in content)

Example: “<![CDATA[<spider></spiders>]]>” is well-formedParsing: Returns “<spider></spiders>” as text;NOT as elements or tags; and NO error (note the 's', which would cause an error if interpreted as a tag)!

Page 32: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 32XML Techniques for E-Commerce: Basics

CData sections: Example

Embedded HTML (unparsed, because no valid XML!)<![CDATA[<p>A short (and incorrect) HTML snippet<br><!— The ending p-tag is missing here for example -->You can also do any other uncommon (&) and strange (<&gt;) things! ]]>Result is pure text:<p>A short (and incorrect) HTML snippet<br><!— The ending p-tag is missing here for example -->You can also do any other uncommon (&) and strange (<&gt;) things!

Avoiding replacement by multiple character references :<![CDATA[int a = b <<< 1; val&=0x7F; if(b>=0) b=b>>1;]]>Instead of (alternate version of above with character references):int a = b &lt;&lt;&lt; 1; val&amp;=0x7F; if(b&gt;=0) b=b&gt;&gt;1;Result is pure text (exactly the same for both!):int a = b <<<1; val&=0x7F; if(b>=0) b=b>>1;

Page 33: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 33XML Techniques for E-Commerce: Basics

CData sections: Example

Integrating a complete mini XML file (header, document-element, an element with an attribute) into the file as ordinary text (therefore unparsed!)

Content: A very brief visiting card in XML format!Create the data to be inserted as a separate file at firstEmbed this into the file as a CDATA section in the second step

VisitingCard.xml, Persondata_2.xml

Page 34: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 34XML Techniques for E-Commerce: Basics

Structure of XML:Comments

May appear anywhere outside of tagsStarts with “<!--” and ends with “-->”

The string “--” may not occur within the commentMay be suppressed by a parser

So perhaps not available to the application!Only for humans directly reading the XML file

Example: “<!-- Prints: ‘Arrival date: <Insert birthday>’ -->”

Page 35: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 35XML Techniques for E-Commerce: Basics

Comments: Example

Extend the personal data file by some comments:A few elementsFile header with name, E-Mail address, date of creation, ...

Persondata_3.xml

Page 36: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 36XML Techniques for E-Commerce: Basics

Structure of XML:Character & Entity references

Character reference: Represents a certain characterTo be used e. g. if the storage does not support Unicode

» E. g. in ASCII text filesSyntax: “&#” decimal number ”;” or “&#x” hex number “;”

Entity reference: Allow substitutionSyntax: “&” name “;” (Parsed) or “%” name “;” (Parameters)No recursion allowed (entity reference may not contain itself)Declaration of entities and different types: See DTD!

Example: “Weight is &#x3e; %value-in-gold; g pure gold.”Might return: “Weight is > 351,7 g pure gold.”

» With the definition: <!ENTITY % value-in-gold "351,7">

Example: &#160; is the same as &nbsp; (no-break space)

Page 37: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 37XML Techniques for E-Commerce: Basics

Character & Entity references : Example

In anticipation of DTDs (entity definition; see later):<!ENTITY copyright "&copy; Michael Sonntag 2003">

Usage:<footer>%copyright;</footer>

Results in the same as:<footer>&#xA9; Michael Sonntag 2003</footer>

Results in the same as:<footer>© Michael Sonntag 2003</footer>

Page 38: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 38XML Techniques for E-Commerce: Basics

Structure of XML:Processing instructions

Processing instructions (PI) allow documents to contain instructions for applications/parsersSyntax: “<?” name parameters “?>”

name: Any name excluding “xml”, “XmL”, …parameters: Any text without “?>” contained

PIs are just passed to the parser/application; it should (but is notrequired to) know about themContent is NOT returned as character data!

Example: “<?php mysql_query(“SELECT * FROM recipes WHERE duration<30”,$id);?>”

Page 39: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 39XML Techniques for E-Commerce: Basics

Problems of XML:Is it enough?

Now we can write content, but we still cannot define it!Use DTD’s; they're part of the XML specification (see immediately)

Entities can still be complicatedParsed/unparsed ones, which are allowed in which position, …

Attributes vs. elements: When to use which?Highly debated

» Result: Designers choice!Elements can be refined, attributes not

» Use attributes sparingly!Assumes everything should (and can) be expressed as tree

Structure of tags within tagsEverything else must be plain text or be expressed as a (sub-)tree

No viewer: Must always be processed

Page 40: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 40XML Techniques for E-Commerce: Basics

Syntax:How the individual elements may be put together

Semantics:What this special combination of elements means

Pragmatics:What is the result of this semantic (application to problems)Debate: Can exist only in humans or also in computers?

Politics: Where do we want to get to?XML (up to here) does not even possess explicit syntaxDTDs provide some syntaxSchemata provide extensive syntax RDF, OWL, ebXML, ... provide some semanticsPragmatics: User / Parser / Application (programs) (or not at all)

Syntax - Semantics - Pragmatics (- Politics?)

Page 41: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 41XML Techniques for E-Commerce: Basics

DTD - The idea behind it

DTD = Document Type DeclarationDefines the elements to be used in XML documentsLists attributes allowed (or required) for elements Describes allowed structural relationships between elements

Which elements may be children of an elementHow often they may/must occur

Specifies sequence (if any) in which elements must appearIn which sequence the children may appear

Can be included in the document or reside externally (*.dtd)Including an external DTD:

» <!DOCTYPE message SYSTEM "message.dtd">

Specifies the grammar (=syntax) for a certain applicationDocuments must follow this grammar exactly to be valid

Page 42: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 42XML Techniques for E-Commerce: Basics

General structure of DTD

The DTD must appear before the first elementI.e. immediately after the XML header and before the doc. element

Two kinds possible:External: <!DOCTYPE message SYSTEM "message.dtd"> Internal: <!DOCTYPE message [ <!ELEMENT message ANY> ]> External+Internal: Also possible; better avoid it!

» Internal DTD takes precedence before external

The name of the DTD (e. g. "message") must be identical with the name of the document element!

Hello.xml(Trivial; with internal DTD)

Page 43: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 43XML Techniques for E-Commerce: Basics

Validity of XML

Only for XML if DTD or Schemata are usedWell-formed documents need not match their/some specificationof the structure

If they do, they are also "valid"Schemas: Many additional rules

E. g. textual content must match the specified datatype

Checking validity verifies the syntax of the document on a higher level of abstraction

"Basic" syntax: Well-formedness (correct form & naming & containment of tags)"Extended" syntax: Validity (correct name & correct content)

» Schemas: Correct datatype, ...

Page 44: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 44XML Techniques for E-Commerce: Basics

Defining elements

"<!ELEMENT" name ("EMPTY"|"ANY"|mixed|children) ">"EMPTY: No content allowed

Must always be an empty element» Example: The <br> element from HTML

ANY: Any content is allowedText and elements in any order and combination

mixed: Sequence of "#PCDATA" or any other element#PCDATA: Parsed character data, i. e. any text

» Text only; no elements allowed!Order or number of occurrences of children CANNOT be defined, only their type!

» Similar to HTML (Text with arbitrary tags/elements inbetween)Syntax: (#PCDATA | ... | ... | ...)*

Important and required!Must be the first one!

Page 45: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 45XML Techniques for E-Commerce: Basics

Defining elements

children: May use parentheses for groupingChoice of children ("|")Sequence of children (",")May NOT contain PCDATA!

» Only for elements (and perhaps whitespaces inbetween them)Qualifier for children:

"?": Optional (child may occur 0 or 1 times)"+": Child may occur 1 or more times"*": Child may occur 0 or more timesNone: Must occur exactly once

Special version of mixed content: PCDATA only"<!ELEMENT" name "(#PCDATA)>" "*" only here optional!

Page 46: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 46XML Techniques for E-Commerce: Basics

Defining elementsExamples

<!ELEMENT barrel (volume,content?,labels*)>Correct:

» <barrel><volume/></barrel>» <barrel><volume/><content/></barrel>» <barrel><volume/><label/><label/></barrel>

Incorrect:» <barrel><content/><volume/></barrel>: Wrong sequence» <barrel><content/><label/></barrel>: “volume” is missing

<!ELEMENT content (#PCDATA|goodsID)*>Correct:

» <content>A lot of garbage</content>» <content>Garbage<goodsID/>Oil<goodsID/><goodsID/></content>

Incorrect:» <content><goodsDesc></content>: “goodsDesc” is tag and not text

1 0 or 1 0 - NFirst volume, then content

(or not) and at the end perhaps several labels

Page 47: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 47XML Techniques for E-Commerce: Basics

Defining elements Example

Define an internal DTD for the visiting card from the previous example

If you used attributes, ignore them for now!

VisitingCard_2.xml

Page 48: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 48XML Techniques for E-Commerce: Basics

Defining attributes

“<!ATTLIST” name attributes “>”name: To which element it belongs

attributes: name type default-valuename: Name of this attributetype: String, enumerated or token (see next page)

» String: “CDATA”: Character data (unparsed)» Enumerated: List of allowed strings

default-value: “#REQUIRED”, “#FIXED”, “#IMPLIED” or none» #REQUIRED: Attribute must be present; no default value» #FIXED: Attribute must be present and have exactly this value» #IMPLIED: Attribute is optional; no default value» None: Attribute is optional; default value must be provided

More than one attribute list for an element exists:Attributes are merged (all attributes of all lists allowed in any order)

Page 49: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 49XML Techniques for E-Commerce: Basics

Types of attributes (Tokens)

ID: Defines an unique (within document) id for this elementAn element may only have one ID (must be #IMPLIED or #REQUIRED)

IDREF: Reference to another ID in this documentThis ID must exist (but may occur later!)

IDREFS: Several valid IDREF may be specifiedENTITY: Must be the name of an unparsed entity

More or less a reference to something external (XML, text, …)ENTITIES: Several valid ENTITY may be specifiedNMTOKEN: For specifying a valid name

More restrictive than CDATA, which might be anythingNMTOKENS: Several valid NMTOKEN may be specified

Important ones

Page 50: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 50XML Techniques for E-Commerce: Basics

Attribute value normalization

Before passed to the application, all attribute values must be normalized by the parser

Character references (see entities below) are taken "as-is"» Replaced, but no normalization takes place on their replacement text

Entity references are replaced and then normalizedWhitespace characters are replaced by a space

» &#xD; (CR), &#xA; (LF), &#x9; (TAB)If attribute type is CDATA (or none available), the result is finishedOtherwise, any leading and trailing blanks are removed and all consecutive blanks are compressed to a single blank

Examples (' ' to represent space character):a=" &#xd;&#xd;A&#xa;&#xa; B &#xd;&#xa; ";

» a is CDATA: "#x20 #xD #xD A #xA #xA #x20 B #x20 #x20 #xD #xA #x20"» a is NMTOKEN: "#xD #xD A #xA #xA B #xD #xA"

Attention: Well-formed but invalid as a NMTOKEN!

Page 51: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 51XML Techniques for E-Commerce: Basics

Attribute examples

<!ATTLIST instrumentattuned (yes|no) “yes”type CDATA #REQUIREDowner IDREF #IMPLIED>

Correct:» <instrument type=“saxophone”/>: Is considered attuned (default!)» <instrument owner=“12” type=“horn” />» <instrument type=“flute” attuned=“no” owner=“Me”/>

Incorrect:» <instrument/>: type is missing» <instrument type=“guitar” attuned/>: attuned must have some value» <instrument type=“” attuned=“maybe”/>: attuned has illegal value» <instrument type=“<to_be_determined>” owner=“14”/>

– Only if no ID=“14” exists somewhere!

Possible valuesDefault value

Must be specifiedCan be specified (optional)

Page 52: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 52XML Techniques for E-Commerce: Basics

Attribute examples

Extend your visiting card by a version number and the date of thelast change

Model both as an attribute!Version number: Optional, default value "1.0"Last change: Obligatory

Afterwards extend the DTD by those two attributes!And all other you did use before

VisitingCard_3.xml

Page 53: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 53XML Techniques for E-Commerce: Basics

DTD example

Create a DTD for the message exampleInclude it as an external DTD

Same file as before!Compare it with an automatically created DTD:

E. g. XMLSpyQuestions:

» Empyt elements?» Multiplicity?» Optional Elemente which are currently missing?

Therefore: Suited as a beginning, but exact checking afterwards needed!

Message.dtd, Message.xml

Page 54: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 54XML Techniques for E-Commerce: Basics

DTD example

Create a complete internal DTD for your personal data fileDon't forget to also update the contained visiting card!

» How to define the visiting card? CDATA does not exist– Hint: Think about what datatype this is, respectively why we use

"CDATA" at all!» Attention: Do you have to include the already created DTD for the

visiting card into the DTD for the personal data? Or somewhere else? Or not at all?

You can assume that at least one course of study is selected by any student, but that some pursue severalThe visiting card is optionalAdd optional Pager information

Persondata_4.xml

Page 55: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 55XML Techniques for E-Commerce: Basics

Conditional sections

Conditional sections allow in-/excluding parts of the DTDMay only be part of the DTD, not the document content!

"<![INCLUDE[" definitions-to-include "]]>"Parse the contained part

"<![IGNORE[" definitions-to-include "]]>"Ignore till the end of the conditional section, may be nested

»Contained INCLUDE contional-sections are still ignored!Example: Used together with parameter entities

<!ENTITY % draft 'INCLUDE' ><!ENTITY % final 'IGNORE' ><![%draft;[ <!ELEMENT book (comments*, title, body, supplements?)> ]]><![%final;[ <!ELEMENT book (title, body, supplements?)> ]]>

Used (and to be used) rarely if at all!

Page 56: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 56XML Techniques for E-Commerce: Basics

Entities (1)

Three kinds exist:Internal entities

» For defining commonly used text in a single location» Always a parsed entity

External entities» For including other XML-files» For defining valid unparsed external components» Almost always a parsed entity ( unless notation is present)

Parameter entities» Are also expanded within the DTD and within other entities» Used in conditional sections » Can be internal or external (but CANNOT contain a notation!)» Always a parsed entity

Page 57: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 57XML Techniques for E-Commerce: Basics

Entities (2)

Two types:Parsed entities

» Will be handled as an included XML file» Can possess different character encoding (UTF-16, …)» If external, should begin with a literal text declaration: “<?xml …. ?>”» Must be well-formed

– No including the start tag from one file and the end tag from another!– External parameter and internal entities are well-formed by definition

Unparsed entities» Are always just a reference to something external» Will NOT be included or parsed; no linebreak handling» E. g. for referencing images» Can only be used as references in ENTITY or ENTITIES attributes» Notations and more on this is not explained here!

Page 58: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 58XML Techniques for E-Commerce: Basics

Defining entities

“<!ENTITY” name definition “>”Parameter entities: “<!ENTITY % ” name definition “>”

» Special replacement rules (e. g. replaced within “normal” entities)Definition: A string value or an external reference

External references: Must include SYSTEM and optionally PUBLIC» Public: Used for generating other URIs (need not itself be an URI!)» See examples!

Contains an URI to some external informationIs a kind of include: May contain text and other elements

» Declarations are NOT allowed» Referenced content MUST be well-formed XML

– OR contain a reference to a declared notation

Page 59: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 59XML Techniques for E-Commerce: Basics

Entity examples

External entities<!ENTITY map SYSTEM"http://www.mountain.org/private/map.xml"><!ENTITY map PUBLIC "-//MOUNTAIN//MAP Special map of location”"http://www.mountain.org/private/map.xml">

» If PUBLIC exists, SYSTEM is the second one and has NO keyword!<!ENTITY door-pic SYSTEM "../imgs/OpenDoor.gif” NDATA gif>

Parameter and internal entities<!ENTITY % YN '"Yes"' ><!ENTITY WhatHeSaid "He said %YN;" >…&WhatHeSaid;

» Result: “He said “Yes””

Parameter entities are replaced within other entities!

Page 60: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 60XML Techniques for E-Commerce: Basics

Predefined entities

Several characters pose problems when used in text:<, >, &, ', "

Both entity and character references can be used for escaping: Then they are treated as text

All XML processors MUST know themBut they SHOULD be declared anyway

<!ENTITY lt "&#38;#60;"><!ENTITY gt "&#62;"><!ENTITY amp "&#38;#38;"><!ENTITY apos "&#39;"><!ENTITY quot "&#34;">

"Double escaping": &#38 = '&''&#38;#60;' = '&#60;'

Page 61: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 61XML Techniques for E-Commerce: Basics

Predefined entitiesDouble escaping

<!ENTITY lt "&#38;#60;">After processing the declaration, 'lt' references the string '&#60;'After replacing '&lt;' in the text, the text contains the string '&#60;'This returns '<' on parsing the text

» Entity declarations are parsed twice: Once on their definition, again when their content is encountered in the text

<!ENTITY lt "&#60;">After processing the declaration, 'lt' references the string '<'After replacing '&lt;' in the text, the text contains '<'On parsing the text a tag would be expected to start!

Page 62: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 62XML Techniques for E-Commerce: Basics

Complicated entity example

<!ENTITY example "<p>An ampersand (&#38;#38;) may be escaped numerically (&#38;#38;#38;) or with a general entity (&amp;amp;).</p>" >After parsing, "example" references the following string:"<p>An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;amp;).</p>"Referencing this in the document using "&example;" results in a "p" element containing the following text:"An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;). "

Entity declaration: Character references are replaced, but entity references are not (See "&#38;#38;" converted to "&#38;", but "&amp;amp;" remaining"&amp;amp;")Entity usage: Character and entity references are replaced

Page 63: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 63XML Techniques for E-Commerce: Basics

Validating and Non-Validating Processors

Validating processors check the documents syntaxMust read and process the entire DTDMust read and process all external parsed entitiesIf not found, readable, ...: Error!

Non-Validating processors have limitations:Obviously: No errors on syntax violations!But also:

» Information returned may vary: Parameter or external entities may or may not have been read

» Therefore avoid this!Complicated definition how far they must read and process an internal DTD

Both must report errors in well-formednessNon-validating: Only as far as they have read external parts!

Page 64: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 64XML Techniques for E-Commerce: Basics

DTD Example:Offline Converter configuration

Contains complete DTDPCDATA, sequence, optional elementsAttribute lists

Contains complete dataMissing:

Entities (external, internal)Character references

converter_config.xml

Previous version of program for converting IMS manifests into static webpages.New version uses schema!

Page 65: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 65XML Techniques for E-Commerce: Basics

XML … yes, nice!But this looks like ….!

Pure XML can be viewed with a webbrowser, but doesn't look too nice and isn't easy to readRemedy: CSS (small solution) or XSLT (large solution)

Both are complex standards, especially XSLTIncluding a stylesheet results in a better display

Message.dtd, Message.css, Message_CSS.xml

Page 66: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 66XML Techniques for E-Commerce: Basics

XML … yes, nice!But this looks like ….!

Before Afterwards

Page 67: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 67XML Techniques for E-Commerce: Basics

ExampleCreate XML for DTD and CSS

Create an XML file for the DTD below and check its well-formedness and validity with a tool

Now you can easily write dramas like Shakespear!» All you need is the DTD (and perhaps a little bit of inspiration!)

Test the presentation with the stylesheet provided!

Book.dtd, Book.css, Book.xml

Page 68: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 68XML Techniques for E-Commerce: Basics

Is DTD the DDT for getting rid of all bugs?

DTD does not support different namespacesTwo companies define the same tag with different meaning

» There can never be a combined document!Very weak datatype system

Applies also only to attributesDTD only provides syntax

DTD is itself not written in XMLThe description of an XML document should be written in XMLThen it can be used by parsers without requiring special handling

Very simple constraintsOnly: 0, 1, 0..n, 1..nImpossible: 3, 5-9, …No choice possible: A or B or C, but only one (or two) of them

Page 69: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 69XML Techniques for E-Commerce: Basics

DTD should not be really used any more!

Because of the weaknesses mentioned before, DTD's should only be supported for backwards compatibility

Biggest problem: DTD is not written in (well-formed) XML itself» Difficult for handling by parsers, programs, etc.

– See part on programming later: Writing internal DTD's or manipulating them does rarely work as expected or isn't supported at all!

When designing new systems: Use schemas!More complicated to learnMuch stronger and extensive languageCan be easily handled by parsers

Some parts are however still important:Especially (external) entites

» If such things are needed!

Page 70: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 70XML Techniques for E-Commerce: Basics

XML Namespaces

HTML

XML

FOXML

SchemaXSLT

ebXML, SOAP,

SecurityMetadata,

...

XPath

Java

XMLName-space

XML

Page 71: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 71XML Techniques for E-Commerce: Basics

One more reason for XML… (1)

Suppose data exchange in binary format works…Then you want to add additional information within itEach and every parser using this data MUST be changed

Alternative: Extensive versioning support must be built in right from the beginning

Extension fields/codes, etc.» Keeping those codes unique: Registry needed, ...

Rather difficult!Alternative: XML

Additional data can always be added without any problemsIf the parser doesn't know it, it will just be ignored (or warned) aboutOnly those programs needing this information must be changed, while the other programs and all parsers stay exactly the same!

Page 72: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 72XML Techniques for E-Commerce: Basics

One more reason for XML… (2)

However, there might still be one problem with extending XML documents…

What if an additional element should be introduced, where an element with exactly this name alread exists?

» Example: Merging customer data and order data together– E. g. Title of the person + title of the book = ????

For this we need different "regions of naming"!

These are called "namespaces" in XMLPractically all XML standards require them for exactly this reason!Each element is additionally "qualified" by an unique name

Page 73: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 73XML Techniques for E-Commerce: Basics

Namespace "example"

The ID is alphanumeric

I thought, the ID is only numbers

The ID is the name and a

number

Page 74: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 74XML Techniques for E-Commerce: Basics

Namespace "example"

The ID is alphanumeric

I thought, the ID is only numbers

The ID is the name and a

numberWhat is the context of ID?

Namespace "invoice" Namespace "customer"

Namespace "order"

invoice:IDNr customer:IDNr

order:IDNr

Page 75: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 75XML Techniques for E-Commerce: Basics

A brief interlude:XML Namespaces

Intention: Reusing markup structure ("vocabulary")Problems:

Recognition: How do we know which namespace it should be in?Collision: What if tags are named the same in different namespaces?

Namespace = Collection of names identified by an URIContent may be used as element types and attribute names

Qualified name (=includes namespace; also called 'QName'): namespace-prefix ":" local-namenamespace-prefix: Mapped to the URI of this namespace

» Interpretation by parser according to URI, not the prefix itself!– You COULD put in the complete URI everywhere instead!

local-name (=like "ordinary" name): See elements above!» Excluding the character ":", obviously!

Page 76: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 76XML Techniques for E-Commerce: Basics

URI, URL, URN

Namespace names are stringsThey must be URI's, but usually they are URL's

URI = Uniform Resource IdentifierFor unique identification of objectsURL and URN are applications (two subtypes) of URI's

URN = Uniform Resource NameUnique naming of objects, independent of locationNeed not be retrievable in any way!Example: urn:www-fim-jku-at/Converter/etc/ConfigFile.xsd

URL = Uniform Resource LocatorFor addressing resources in the InternetNeed not be unique (several URL's for one resource, several resources for one URL), specifies a "location" (=for retrieval)Example: http://www.fim.uni-linz.ac.at/Converter/ConfigFile.xsd

Page 77: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 77XML Techniques for E-Commerce: Basics

XML Namespaces:Defining a namespace

Namespace declarations are attributes of elementsTwo versions are possible:

Unnamed: "xmlns=‘ " URI " ’ "» Scope: This element only; See also below (defaulting)

Named: "xmlns:" name "=‘ " URI " ’ "» Scope: Everywhere the name is used + this element» The name itself has no meaning and can be choosen arbitrarily

Scope ≠ Where it applies! Scope = Where it can be used!URI: Defines the namespace; need not be retrievable

Can therefore be a URNname: May not start with "xml", "XML", "xMl", …Example: <party:x xmlns:party="http://organizations.org/company">…</party:x>

The "x" element and all contained elements are part of the namespace from "http://organizations.org/company"

Page 78: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 78XML Techniques for E-Commerce: Basics

Are we now qualified?

Qualification is possible for both elements and attributes:Qualified name replaces the "ordinary" name in both casesAttributes: Name may be the same IF the namespace is different!

Examples:<g xmlns:party="http://organizations.org/company"><party:leader title="Mr.">... </party:leader><business party:registerNo="exempt">Headhunter</business></g>

» "leader" and "registerNo" are in NS "http://organizations.org/company"» "g" is NOT in this NS» "business", "title": See "defaulting" below!

<tree biology:type="fir" age:type="young" state:type="weak"/>» Three times the same attribute("type")» But each time in a different namespace; therefore valid!

Page 79: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 79XML Techniques for E-Commerce: Basics

Uniqueness of namespaces

Namespace names must be unique (worldwide)Otherwise they bring no advantage at all!

Therefore usually constructed using domain namesThese are worldwide unique because of ICANN and registrars

Below (after) the domain (name), the company is itself responsible for unique namesExamples:

http://www.fim.uni-linz.ac.at/Konverter/ConfigFile.xsdhttp://www.fim.uni-linz.ac.at/Emerald/2003/1.0Alpha

Uniqueness guaranteed by ICANN & registrar

Uniqueness guaranteed by FIM (hopefully!)

Page 80: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 80XML Techniques for E-Commerce: Basics

Defaulting:For the lazy ones!

A namespace applies to the element it is used for and ALL elements within it (unless overridden by another use!)

Previous example: "business" is not part of namespace party, because its ancestor "g" is not in this namespace<party:g xmlns:party="..."><business ...>...</business></party:g>

» "business" is within the namespace party, because its ancestor "g" is within this namespace

Specifying an empty namespace removes the default one for thisand all its child elements

Page 81: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 81XML Techniques for E-Commerce: Basics

Defaulting:Attributes

Default NS do NOT apply to attributes, only to elements!<g xmlns:party="http://organizations.org/company"><party:leader title="Mr.">... </party:leader><business party:registerNo="exempt">Headhunter</business></g>"registerNo" must explicitly specify its NS to be within it!"title" is NOT in namespace party!

Different attributes of one element can use different NSNo defaulting for attributes, therefore always explicit specifiation!Possible: Same NS, different local namePossible: Different NS, same local nameNOT possible: Same NS AND same local name!

» Similar to "ordinary" XML: No two identical attributes» See tree example above!

Page 82: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 82XML Techniques for E-Commerce: Basics

XML+Namespace Example:IMS Manifest

Describes the organization of an online courseActually, only contains all possible metadata, but no real course!

No DTD – Uses schemata (see later)See especially:

Namespace declaration (document element)schema vs. lom: Namespace usetitle – langstring: xml:lang attributeStrange characters, e. g. keyword: UTF-8 encoding of "Umlaute""validity" element: Alternate namespace

» But "datetime" within: again in "imsmd"!vcard: Structured data within

» Could also be modelled in XML, but is according to another standardimsmanifest.xml

Page 83: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 83XML Techniques for E-Commerce: Basics

LiteratureSpecifications

XML 1.0 Specification (Version 3)http://www.w3.org/TR/REC-xmlXML 1.1 Specificationhttp://www.w3.org/TR/xml11Tim Bray: Commented XML specificationhttp://www.xml.com/axml/testaxml.htm

For original 1.0 specification only!XML Namespaces Specificationhttp://www.w3.org/TR/REC-xml-names/Markup Languages Cover Pageshttp://www.oasis-open.org/cover/

Page 84: XML, Namespaces, DTD XML Techniques for E-Commerce ... · Michael Sonntag XML Techniques for E-Commerce: Basics 7 Reasons & Goals for XML (1) zFormat for storage of data ÆIndependent

Michael Sonntag 84XML Techniques for E-Commerce: Basics

LiteratureOther

Knobloch/Knopp: Web-Design mit XML. Heidelberg: dpunkt 2001Stanek: XML Pocket Consultant. Redmond, MS Press, 2002Harold: The XML Bible2

http://www.ibiblio.org/xml/books/bible2/Microsoft XML information:http://msdn.microsoft.com/xml/Lots of XML information:http://www.xml.org/http://www.xml.com/http://www.devx.com/xml/http://xmlfiles.com/W3 Schools:http://www.w3schools.com/