105
Internet Technologies 1 XML Grammars 95-733 Internet Technologies

XML Grammars

  • Upload
    marja

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

XML Grammars. 95-733 Internet Technologies. XML Grammars: Three Major Uses. 1. Validation Code Generation Communication. XML Validation. Sources for this lecture: “ Data on the Web ” Abiteboul, Buneman and Suciu “ XML in a Nutshell ” Harold and Means “ The XML Companion ” Bradley - PowerPoint PPT Presentation

Citation preview

Page 1: XML Grammars

Internet Technologies 1

XML Grammars

95-733 Internet Technologies

Page 2: XML Grammars

Internet Technologies 2

XML Grammars: Three Major Uses

1. Validation

2. Code Generation

3. Communication

Page 3: XML Grammars

Internet Technologies 3

XML Validation

Sources for this lecture:

“Data on the Web” Abiteboul, Buneman and Suciu “XML in a Nutshell” Harold and Means “The XML Companion” Bradley

The validation examples were originally tested with an older parserand so the specific outputs may differ from those shown.

Page 4: XML Grammars

Internet Technologies 4

XML Validation

A batch validating process involves comparing the DTD against a complete document instance and producing a report containing any errors or warnings.

Consider batch validation to be analogous to program compilation, with similar errors detected.

Interactive validation involves constant comparison of the DTDagainst a document as it is being created.

Page 5: XML Grammars

Internet Technologies 5

XML Validation

The benefits of validating documents against a DTD include:

• Programmers can write extraction and manipulation filters without fear of their software ever processing unexpected input.

• Using an XML-aware word processor, authors and editors can be guided and constrained to produce conforming documents. Consider how Netbeans allows you to edit web.xml files.

Page 6: XML Grammars

Internet Technologies 6

XML Validation Examples

XML elements may contain further, embedded elements, andthe entire document must be enclosed by a single documentelement.

These are recursive hierarchical structures.

A Document Type Definition (DTD) contains rules for each element allowed within a specific class of documents.

Page 7: XML Grammars

Internet Technologies 7

Things the DTD does not do:

• Specify the document root.• Specify the number of instances of each kind of element. (Or, it’s rather hard to do.)• Describe the character data inside an element (the precise syntax).•DTD’s don’t naturally handle namespaces.• The XML schema language is much more recent and improves on DTD’s. We have “programmer level” type specifications.• To see a real DTD, view source on http://www.silmaril.ie/software/rss2.dtd

Page 8: XML Grammars

Internet Technologies 8

We’ll run this program against several xml fileswith DTD’s. We’ll study thecode soon.

// Validate.java using Xerces

import java.io.*;

import org.xml.sax.ErrorHandler;import org.xml.sax.SAXException;import org.xml.sax.SAXParseException;import org.xml.sax.XMLReader;import org.xml.sax.InputSource;import org.xml.sax.helpers.XMLReaderFactory;import org.xml.sax.helpers.DefaultHandler;

This slide shows the importedclasses.

Page 9: XML Grammars

Internet Technologies 9

public class Validate { public static boolean valid = true;

public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); }

Here we check if the commandline is correct.

Page 10: XML Grammars

Internet Technologies 10

try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser");

// request validation reader.setFeature("http://xml.org/sax/features/validation", true);

// associate an InputSource object with the file name InputSource inputSource = new InputSource(argv[0]);

// go ahead and parse reader.parse(inputSource); }

Page 11: XML Grammars

Internet Technologies 11

catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); }}

// Catch any errors or fatal errors here.// The parser will handle simple warnings.

Page 12: XML Grammars

Internet Technologies 12

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"><FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments></FixedFloatSwap>

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >

XML Document

DTD

Valid document is true

Page 13: XML Grammars

Internet Technologies 13

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "http://localhost:8001/dtd/FixedFloatSwap.dtd"><FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments></FixedFloatSwap>

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >

XML Document

DTD on the Web?VERY NICE

Valid document is true

Page 14: XML Grammars

Internet Technologies 14

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap [

<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) >]>

<FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments></FixedFloatSwap>

XML Document withan internal subset

Valid document is true

Page 15: XML Grammars

Internet Technologies 15

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"><FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments></FixedFloatSwap>

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >

XML Document

DTD

Valid document is false

Page 16: XML Grammars

Internet Technologies 16

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Swaps SYSTEM "FixedFloatSwap.dtd"><Swaps> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>

<FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap></Swaps>

XML Document

Page 17: XML Grammars

Internet Technologies 17

<?xml version="1.0" encoding="utf-8"?><!ELEMENT Swaps (FixedFloatSwap+) ><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >

DTD

C:\McCarthy\www\examples\sax>java Validate FixedFloatSwap.xml

Quantity Indicators ? 0 or 1 time + 1 or more times * 0 or more times

Valid document is true

Page 18: XML Grammars

Internet Technologies 18

Is this a valid document?

<?xml version="1.0"?><!DOCTYPE person [ <!ELEMENT person (name+, profession*)> <!ELEMENT profession (#PCDATA)> <!ELEMENT name (#PCDATA)>]>

<person> <name>Alan Turing</name> <profession>computer scientist</profession> <profession>cryptographer</profession></person>

Sure!

Page 19: XML Grammars

Internet Technologies 19

The locations where document text data is allowed are indicated by the keyword ‘PCDATA’ (Parsed Character Data).

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">

<FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears> <StartYear>2000</StartYear> <EndYear>2002</EndYear> </NumYears> <NumPayments>6</NumPayments>

</FixedFloatSwap>

XML Document

Page 20: XML Grammars

Internet Technologies 20

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >

C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xmlorg.xml.sax.SAXParseException: Element "NumYears" does not allow "StartYear" --(#PCDATA)org.xml.sax.SAXParseException: Element type "StartYear" is not declared.org.xml.sax.SAXParseException: Element "NumYears" does not allow "EndYear" -- (#PCDATA)org.xml.sax.SAXParseException: Element type "EndYear" is not declared.Valid document is false

Output

DTD

Page 21: XML Grammars

Internet Technologies 21

There are strict rules which must be applied when an element is allowed to contain both text and child elements.

The PCDATA keyword must be the first token in the group, and the group must be a choice group (using “|” not “,”).

The group must be optional and repeatable.

This is known as a mixed content model.

Mixed Content

Page 22: XML Grammars

Internet Technologies 22

<?xml version="1.0" encoding="utf-8"?><!ELEMENT Mixed (emph) ><!ELEMENT emph (#PCDATA | sub | super)* ><!ELEMENT sub (#PCDATA)><!ELEMENT super (#PCDATA)>

DTD

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Mixed SYSTEM "Mixed.dtd"><Mixed> <emph>H<sub>2</sub>O is water.</emph></Mixed>

XML Document

Valid document istrue

Page 23: XML Grammars

Internet Technologies 23

Is this a valid document?<?xml version="1.0"?><!DOCTYPE page [ <!ELEMENT page (paragraph+)> <!ELEMENT paragraph ( #PCDATA | profession | bold)*> <!ELEMENT profession (#PCDATA)> <!ELEMENT bold (#PCDATA)>]><page> <paragraph> Alan Turing broke codes during <bold>World War II</bold>. He very precisely defined the notion of "algorithm". And so he had several professions: <profession>computer scientist</profession> <profession>cryptographer</profession> And <profession>mathematician</profession> </paragraph></page>

Sure!

Page 24: XML Grammars

Internet Technologies 24

How about this one?

java Validate mixed.xmlorg.xml.sax.SAXParseException:The content of element type "page" must match "(paragraph)+".Valid document is false

<?xml version="1.0"?><!DOCTYPE page [ <!ELEMENT page (paragraph+)> <!ELEMENT paragraph ( #PCDATA | profession | bold)*> <!ELEMENT profession (#PCDATA)> <!ELEMENT bold (#PCDATA)>]><page> The following is a paragraph marked up in XML. <paragraph> Alan Turing broke codes during <bold>World War II</bold>. He very precisely defined the notion of "algorithm". And so he had several professions: <profession>computer scientist</profession> <profession>cryptographer</profession> And <profession>mathemetician </profession> </paragraph></page>

Page 25: XML Grammars

Internet Technologies 25

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> <Note> <![CDATA[This is text that <b>will not be parsed for markup]]> </Note> </FixedFloatSwap>

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap ( Notional, Fixed_Rate, NumYears, NumPayments, Note ) ><!ELEMENT Notional (#PCDATA)><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) ><!ELEMENT Note (#PCDATA) >

XML Document

DTD

CDATA Section

Page 26: XML Grammars

Internet Technologies 26

Recursion<?xml version="1.0"?><!DOCTYPE tree [ <!ELEMENT tree (node)> <!ELEMENT node (leaf | (node,node))>

<!ELEMENT leaf (#PCDATA)>]>

<tree> <node> <leaf>A DTD is a context-free grammar</leaf> </node></tree>

java Validate recursive1.xmlValid document is true

Page 27: XML Grammars

Internet Technologies 27

How about this one?<?xml version="1.0"?><!DOCTYPE tree [ <!ELEMENT tree (node)> <!ELEMENT node (leaf | (node,node))>

<!ELEMENT leaf (#PCDATA)>]><tree> <node> <leaf>Alan Turing would like this</leaf> </node> <node> <leaf>Alan Turing would like this</leaf> </node></tree>

java Validate recursive1.xmlorg.xml.sax.SAXParseException:The content of element type"tree" must match "(node)".Valid document is false

Page 28: XML Grammars

Internet Technologies 28

Relational Databases and XML

Consider the relational database r1(a,b,c), r2(c,d)

r1: a b c r2: c d a1 b1 c1 c2 d2 a2 b2 c2 c3 d3 c4 d4

How can we represent this database with an XML DTD?

Page 29: XML Grammars

Internet Technologies 29

Relations<?xml version="1.0"?><!DOCTYPE db [ <!ELEMENT db (r1*, r2*)> <!ELEMENT r1 (a,b,c)> <!ELEMENT r2 (c,d)> <!ELEMENT a (#PCDATA)> <!ELEMENT b (#PCDATA)> <!ELEMENT c (#PCDATA)> <!ELEMENT d (#PCDATA)> ]>

<db> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r2><c> c2 </c> <d> d2 </d> </r2> <r2><c> c3 </c> <d> d3 </d> </r2> <r2><c> c4 </c> <d> d4 </d> </r2></db>

java Validate Db.xmlValid document is true

There is a small problem….

Page 30: XML Grammars

Internet Technologies 30

Relations<?xml version="1.0"?><!DOCTYPE db [ <!ELEMENT db (r1|r2)* > <!ELEMENT r1 ((a,b,c) | (a,c,b) | (b,a,c) | (b,c,a) | (c,a,b) | (c,b,a))> <!ELEMENT r2 ((c,d) | (d,c))> <!ELEMENT a (#PCDATA)> <!ELEMENT b (#PCDATA)> <!ELEMENT c (#PCDATA)> <!ELEMENT d (#PCDATA)> ]><db> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r2><c> c2 </c> <d> d2 </d> </r2> <r2><c> c3 </c> <d> d3 </d> </r2> <r2><c> c4 </c> <d> d4 </d> </r2></db>

The order of the relationsshould not count and neithershould the order ofcolumns within rows.

Page 31: XML Grammars

Internet Technologies 31

AttributesAn attribute is associated with a particular element by the DTDand is assigned an attribute type.

The attribute type can restrict the range of values it can hold.

Example attribute types include :

CDATA indicates a simple string of characters NMTOKEN indicates a word or token A named token group such as (left | center | right) ID an element id that holds a unique value (among other element ID’s in the document) IDREF attributes refer to an ID

Page 32: XML Grammars

Internet Technologies 32

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) ><!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED>

DTD

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>

XML Document

C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xmlorg.xml.sax.SAXParseException: Attribute value for "currency" is #REQUIRED.

Valid document is false

Page 33: XML Grammars

Internet Technologies 33

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) ><!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED>

DTD

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>

XML Document

Valid document is true

Page 34: XML Grammars

Internet Technologies 34

DTD

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>

XML Document

Valid document is true#IMPLIED means optional

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) ><!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED><!ATTLIST FixedFloatSwap note CDATA #IMPLIED>

Page 35: XML Grammars

Internet Technologies 35

DTD

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap note = “For your eyes only”> <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>

XML Document

Valid document is true

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) ><!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED><!ATTLIST FixedFloatSwap note CDATA #IMPLIED>

Page 36: XML Grammars

Internet Technologies 36

ID and IDREF Attributes

We can represent complex relationships within an XML document using ID and IDREF attributes.

Page 37: XML Grammars

Internet Technologies 37

An Undirected Graph

u v w

x y z

edgevertex

Page 38: XML Grammars

Internet Technologies 38

A Directed Graph

uw

v

y x

Page 39: XML Grammars

Internet Technologies 39

Math 100 Geom100

Calc100 Calc200 Calc300

Philo45CS1 CS2

This is called a DAG (Directed Acyclic Graph)

Page 40: XML Grammars

Internet Technologies 40

<?xml version="1.0"?>

<!DOCTYPE Course_Descriptions SYSTEM "course_descriptions.dtd">

<Course_Descriptions>

<Course>

<Course-ID id = "Math100" />

<Title>Algebra I</Title>

<Description> Students in this course study

introductory algebra.

</Description>

<Prerequisites/>

</Course>

This course has an ID

But no prerequisites

Page 41: XML Grammars

Internet Technologies 41

<Course>

<Course-ID id = "Geom100" />

<Title>Geometry I</Title>

<Description> Students in this course study how to

prove several theorems in geometry.

</Description>

<Prerequisites/>

</Course>

The DTD will forcethis to be unique.

Page 42: XML Grammars

Internet Technologies 42

<Course>

<Course-ID id="Calc100" />

<Title>Calculus I</Title>

<Description> Students in this course study the derivative.

</Description>

<Prerequisites pre="Math100 Geom100" />

</Course>

<Course>

These are references toID’s. (IDREFS)

Page 43: XML Grammars

Internet Technologies 43

<Course-ID id = "Calc200" />

<Title>Calculus II</Title>

<Description> Students in this course study the integral.

</Description>

<Prerequisites pre="Calc100" />

</Course>

The DTD requires that this namebe a unique id defined within thisdocument. Otherwise, the documentis invalid.

Page 44: XML Grammars

Internet Technologies 44

<Course>

<Course-ID id = "Calc300" />

<Title>Calculus II</Title>

<Description> Students in this course study the derivative

and the integral (in 3-space).

</Description>

<Prerequisites pre="Calc200" />

</Course>

Prerequisites is an EMPTYelement. It’s used only for itsattributes.

Page 45: XML Grammars

Internet Technologies 45

<Course>

<Course-ID id = "CS1" />

<Title>Introduction to Computer Science I</Title>

<Description> In this course we study Turing machines.

</Description>

<Prerequisites pre="Calc100" />

</Course>

<Course>

IDREF ID

A One-to-one link

Page 46: XML Grammars

Internet Technologies 46

<Course-ID id = "CS2" />

<Title>Introduction to Computer Science II</Title>

<Description> In this course we study basic data structures.

</Description>

<Prerequisites pre="Calc200 CS1"/>

</Course>

<Course>

IDREFS

ID

ID

One-to-many links

Page 47: XML Grammars

Internet Technologies 47

<Course-ID id = "Philo45" />

<Title>Ethical Implications of Information Technology</Title>

<Description> TBA

</Description>

<Prerequisites/>

</Course>

</Course_Descriptions>

Page 48: XML Grammars

Internet Technologies 48

<?xml version="1.0"?>

<!-- Course Description DTD --> <!ELEMENT Course_Descriptions (Course)+> <!ELEMENT Course (Course-ID,Title,Description,Prerequisites)> <!ELEMENT Course-ID EMPTY> <!ELEMENT Title (#PCDATA)> <!ELEMENT Description (#PCDATA)> <!ELEMENT Prerequisites EMPTY>

<!ATTLIST Course-ID id ID #REQUIRED>

<!ATTLIST Prerequisites pre IDREFS #IMPLIED>

The Course_Descriptions.dtd

Page 49: XML Grammars

Internet Technologies 49

General Entities &

General entities are used to place text into the XML document.

They may be declared in the DTD and referenced in the document.

They may also be declared in the DTD as residing in a file. Theymay then be referenced in the document.

Page 50: XML Grammars

Internet Technologies 50

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ <!ENTITY bankname "Mellon National Bank and Trust" > ]> <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Bank,Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Bank (#PCDATA) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >

DTD

Document usinga General Entity

Validate is true

Page 51: XML Grammars

Internet Technologies 51

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match = "Bank"> <WML> <CARD> <xsl:apply-templates/> </CARD> </WML> </xsl:template>

<xsl:template match = "Notional | Fixed_Rate | NumYears | NumPayments"> </xsl:template> </xsl:stylesheet>

XSLT Program

The general entity is replaced before xslt sees it.

Page 52: XML Grammars

Internet Technologies 52

C:\McCarthy\www\46-928\examples\sax>java -Dcom.jclark.xsl.sax.parser=com.jclark.xml.sax.CommentDriver com.jclark.xsl.sax.Driver FixedFloatSwap.xml FixedFloatSwap.xsl FixedFloatSwap.wml

C:\McCarthy\www\46-928\examples\sax>type FixedFloatSwap.wml

<?xml version="1.0" encoding="utf-8"?>

<WML><CARD>Mellon National Bank and Trust</CARD></WML>

XSLT OUTPUT

Page 53: XML Grammars

Internet Technologies 53

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [

<!ENTITY bankname SYSTEM "JustAFile.dat" >

]> <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>

An external text entity

Page 54: XML Grammars

Internet Technologies 54

Mellon Bank And Trust CorporationPittsburgh PA

XSLT Output

<?xml version="1.0" encoding="utf-8"?>

<WML><CARD>Mellon Bank And Trust CorporationPittsburgh PA</CARD></WML>

JustAFile.dat

Page 55: XML Grammars

Internet Technologies 55

Parameter Entities %

While general entities are used to place text into the XML documentparameter entities are used to modify the DTD.

We want to build modular DTD’s so that we can create new DTD’susing existing ones.

We’ll look at slide from www.fpml.org and the see some examples.

Page 56: XML Grammars

Internet Technologies 56

FpML is a Complete Description of the Trade

Pool of modular componentsgrouped into separate namespaces

Date ScheduleProduct

Rate

Adjustable PeriodNotional

Party

Trade

Trade ID

Product

Rate

Adjustable Period

Notional

Party

Vanilla SwapVanilla Fixed Float SwapCancellableSwaptionFX SpotFX OutrightFX SwapForward Rate Agreement...

MoneyDate

Page 57: XML Grammars

Internet Technologies 57

<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ENTITY % parsedCharacterData "(#PCDATA)"><!ELEMENT Notional %parsedCharacterData; ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >

XML Document

DTD

Internal Parameter Entities

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>

Page 58: XML Grammars

Internet Technologies 58

External Parameter Entities and DTD Components

<?xml version="1.0" encoding = "UTF-8"?><!DOCTYPE ORDER SYSTEM "order.dtd"><!-- example order form from “XML A Manager’s Guide” --><ORDER SOURCE ="web" CUSTOMERTYPE="consumer" CURRENCY="USD"> <addresses> <address ADDTYPE="billship"> <firstname>Kevin</firstname> <lastname>Dick</lastname> <street ORDER="1">123 Anywhere Lane</street> <street ORDER="2">Apt 1b</street> <city>Palo Alto</city> <state>CA</state> <postal>94303</postal> <country>USA</country> </address>

Order.xml

Page 59: XML Grammars

Internet Technologies 59

<address ADDTYPE="bill"> <firstname>Kevin</firstname> <lastname>Dick</lastname> <street ORDER="1">123 Not The Same Lane</street> <street ORDER="2">Work Place</street> <city>Palo Alto</city> <state>CA</state> <postal>94300</postal> <country>USA</country> </address> </addresses>

An order may have more than oneaddress.

Page 60: XML Grammars

Internet Technologies 60

<lineitems> <lineitem ID="line1"> <product CAT="MBoard">440BX Motherboard</product> <quantity>1</quantity> <unitprice>200</unitprice> </lineitem> <lineitem ID="line2"> <product CAT = "RAM">128 MB PC-100 DIMM</product> <quantity>2</quantity> <unitprice>175</unitprice> </lineitem> <lineitem ID="line3"> <product CAT="CDROM">40x CD-ROM</product> <quantity>1</quantity> <unitprice>50</unitprice> </lineitem> </lineitems>

Several productsmay be purchased.

Page 61: XML Grammars

Internet Technologies 61

<payment> <card CARDTYPE="VISA"> <cardholder>Kevin S. Dick</cardholder> <cardnumber>11111-22222-33333</cardnumber> <expiration>01/01</expiration> </card> </payment></ORDER>

The payment is witha Visa card.

We want this document to be validated.

Page 62: XML Grammars

Internet Technologies 62

order.dtd<?xml version="1.0" encoding="UTF-8"?>

<!-- Example Order form DTD adapted from XML: A Manager's Guide -->

<!-- Define an ORDER element -->

<!ELEMENT ORDER (addresses, lineitems, payment)> <!ATTLIST ORDER SOURCE (web | phone | retail) #REQUIRED CUSTOMERTYPE (consumer | business) "consumer" CURRENCY CDATA "USD">

Define an order based on other elements.

Page 63: XML Grammars

Internet Technologies 63

<!ENTITY % anAddress SYSTEM "address.dtd" >%anAddress;

<!-- Collection of Addresses --><!ELEMENT addresses (address+)>

<!ENTITY % aLineItem SYSTEM "lineitem.dtd" >%aLineItem;

<!-- Collection of LineItems --><!ELEMENT lineitems (lineitem+)>

<!ENTITY % aPayment SYSTEM "payment.dtd" >%aPayment;

External parameterentity declaration %

External parameter entity reference %

Page 64: XML Grammars

Internet Technologies 64

address.dtd<!-- Address Structure --><!ELEMENT address (firstname, middlename?, lastname, street+, city, state,postal,country)>

<!ELEMENT firstname (#PCDATA)><!ELEMENT middlename (#PCDATA)><!ELEMENT lastname (#PCDATA)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT state (#PCDATA)><!ELEMENT postal (#PCDATA)><!ELEMENT country (#PCDATA)><!ATTLIST address ADDTYPE (bill | ship | billship) "billship"><!ATTLIST street ORDER CDATA #IMPLIED>

Page 65: XML Grammars

Internet Technologies 65

lineitem.dtd<!ELEMENT lineitem (product,quantity,unitprice)><!ATTLIST lineitem ID ID #REQUIRED>

<!ELEMENT product (#PCDATA)><!ATTLIST product CAT (CDROM|MBoard|RAM) #REQUIRED>

<!ELEMENT quantity (#PCDATA)><!ELEMENT unitprice (#PCDATA)>

Page 66: XML Grammars

Internet Technologies 66

<!ELEMENT payment (card | PO)><!ELEMENT card (cardholder, cardnumber, expiration)><!ELEMENT cardholder (#PCDATA)><!ELEMENT cardnumber (#PCDATA)><!ELEMENT expiration (#PCDATA)><!ELEMENT PO (number,authorization*)><!ELEMENT number (#PCDATA)><!ELEMENT authorization (#PCDATA)>

<!ATTLIST card CARDTYPE (VISA|MasterCard|Amex) #REQUIRED>

payment.dtd

Page 67: XML Grammars

Internet Technologies 67

XML Schemas Improve on DTD’s

• XML Schema is the official name• XSDL (XML Schema Definition Language) is the language used to create schema definitions• XML Syntax• Can be used to more tightly constrain a document instance• Supports namespaces• Permits type derivation• Harder than DTD’s

Page 68: XML Grammars

Internet Technologies 68

Other Grammars Include

• RELAX • TREX (James Clark - Tree Regular Expressions

for XML)• RELAX NG (RELAX and TREX combined to

Relax Next Generation)• Schematron (“Rule based” rather than “grammar

based” see www.ascc.net/xml/schematron) Based on XSLT and XPath

Page 69: XML Grammars

Internet Technologies 69

XSDL - A Simple Purchase Order

<?xml version="1.0" encoding="UTF-8"?> <!-- po.xml -->

<purchaseOrder orderDate="07.23.2001" xmlns="http://www.cds-r-us.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.cds-r-us.com po.xsd">

Page 70: XML Grammars

Internet Technologies 70

<recipient country="USA"> <name>Dennis Scannel</name> <street>175 Perry Lea Side Road</street> <city>Waterbury</city> <state>VT</state> <postalCode>15216</postalCode> </recipient>

<order> <cd artist="Brooks Williams" title="Little Lion" /> <cd artist="David Wilcox" title="What you whispered" /> </order>

</purchaseOrder>

Page 71: XML Grammars

Internet Technologies 71

Purchase Order XSDL

<?xml version="1.0" encoding="utf-8"?> <!-- po.xsd --><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cds-r-us.com" targetNamespace="http://www.cds-r-us.com" >

Page 72: XML Grammars

Internet Technologies 72

<xs:element name="purchaseOrder">

<xs:complexType> <xs:sequence> <xs:element ref="recipient" /> <xs:element ref="order" /> </xs:sequence> <xs:attribute name="orderDate" type="xs:string" /> </xs:complexType>

</xs:element>

Page 73: XML Grammars

Internet Technologies 73

<xs:element name = "recipient">

<xs:complexType> <xs:sequence> <xs:element ref="name" /> <xs:element ref="street" /> <xs:element ref="city" /> <xs:element ref="state" /> <xs:element ref="postalCode" /> </xs:sequence> <xs:attribute name="country" type="xs:string" /> </xs:complexType>

</xs:element>

Page 74: XML Grammars

Internet Technologies 74

<xs:element name = "name" type="xs:string" /> <xs:element name = "street" type="xs:string" /> <xs:element name = "city" type="xs:string" /> <xs:element name = "state" type="xs:string" /> <xs:element name = "postalCode" type="xs:short" />

<xs:element name = "order"> <xs:complexType> <xs:sequence> <xs:element ref="cd" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>

Page 75: XML Grammars

Internet Technologies 75

<xs:element name="cd"> <xs:complexType> <xs:attribute name="artist" type="xs:string" /> <xs:attribute name="title" type="xs:string" /> </xs:complexType> </xs:element>

</xs:schema>

Page 76: XML Grammars

Internet Technologies 76

Validate.java// Validate.java using Xerces

import java.io.*;

import org.xml.sax.ErrorHandler;import org.xml.sax.SAXException;import org.xml.sax.SAXParseException;import org.xml.sax.XMLReader;import org.xml.sax.InputSource;import org.xml.sax.helpers.XMLReaderFactory;import org.xml.sax.helpers.DefaultHandler;import java.io.*;

Page 77: XML Grammars

Internet Technologies 77

import javax.xml.parsers.SAXParser;

import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.helpers.DefaultHandler;

import org.xml.sax.SAXException;

import org.xml.sax.InputSource;

import org.xml.sax.SAXParseException;

Page 78: XML Grammars

Internet Technologies 78

public class Validate extends DefaultHandler { public static boolean valid = true;

public void error(SAXParseException exception) { System.out.println("Received notification of a recoverable error." + exception); valid = false; }

public void fatalError(SAXParseException exception) { System.out.println("Received notification of a non-recoverable error."+

exception); valid = false; } public void warning(SAXParseException exception) { System.out.println("Received notification of a warning."+ exception); }

Page 79: XML Grammars

Internet Technologies 79

public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); } try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); // request validation reader.setFeature("http://xml.org/sax/features/validation",true); reader.setFeature( "http://apache.org/xml/features/validation/schema",true); reader.setErrorHandler(new Validate()); // associate an InputSource object with the file name InputSource inputSource = new InputSource(argv[0]);

// go ahead and parse reader.parse(inputSource);

Page 80: XML Grammars

Internet Technologies 80

} catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); }}

Page 81: XML Grammars

Internet Technologies 81

XML Document<?xml version="1.0" encoding="utf-8"?><itemList xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:noNamespaceSchemaLocation="itemList.xsd"> <item> <name>pen</name> <quantity>5</quantity> </item> <item> <name>eraser</name> <quantity>7</quantity> </item> <item> <name>stapler</name> <quantity>2</quantity> </item></itemList>

Page 82: XML Grammars

Internet Technologies 82

XSDL Grammar itemList.xsd

<?xml version="1.0" encoding="utf-8"?><xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>

<xsd:element name="itemList"> <xsd:complexType> <xsd:sequence> <xsd:element ref="item" minOccurs="0" maxOccurs="3"/> </xsd:sequence> </xsd:complexType> </xsd:element>

Page 83: XML Grammars

Internet Technologies 83

<xsd:element name="item">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="name"/>

<xsd:element ref="quantity"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="name" type="xsd:string"/>

<xsd:element name="quantity" type="xsd:short"/>

</xsd:schema>

Page 84: XML Grammars

Internet Technologies 84

D:..95-733\examples\XSDL\testing>ant run

Buildfile: build.xml

run:

Running Validate.java on itemList-xsd.xml

Valid Document is true

Page 85: XML Grammars

Internet Technologies 85

Another Example

<?xml version="1.0" encoding="UTF-8"?> <!-- po.xml --><myns:purchaseOrder orderDate="07.23.2001" xmlns:myns="http://www.cds-r-us.com" xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.cds-r-us.com po.xsd">

Page 86: XML Grammars

Internet Technologies 86

<myns:recipient country="USA">

<myns:name>Dennis Scannel</myns:name>

<myns:street>175 Perry Lea Side Road</myns:street>

<myns:city>Waterbury</myns:city>

<myns:state>VT</myns:state>

<myns:postalCode>05675A</myns:postalCode>

</myns:recipient>

Note that there is a problem with this document.

Page 87: XML Grammars

Internet Technologies 87

<myns:order>

<myns:cd artist="Brooks Williams" title="Little Lion" />

<myns:cd artist="David Wilcox" title="What you whispered" />

</myns:order>

</myns:purchaseOrder>

Page 88: XML Grammars

Internet Technologies 88

XSDL Grammar po.xsd

<?xml version="1.0" encoding="utf-8"?> <!-- po.xsd --><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cds-r-us.com" targetNamespace="http://www.cds-r-us.com" > <xs:element name="purchaseOrder"> <xs:complexType> <xs:sequence> <xs:element ref="recipient" /> <xs:element ref="order" /> </xs:sequence> <xs:attribute name="orderDate" type="xs:string" /> </xs:complexType> </xs:element>

Page 89: XML Grammars

Internet Technologies 89

<xs:element name = "recipient"> <xs:complexType> <xs:sequence> <xs:element ref="name" /> <xs:element ref="street" /> <xs:element ref="city" /> <xs:element ref="state" /> <xs:element ref="postalCode" /> </xs:sequence> <xs:attribute name="country" type="xs:string" /> </xs:complexType> </xs:element>

Page 90: XML Grammars

Internet Technologies 90

<xs:element name = "name" type="xs:string" /> <xs:element name = "street" type="xs:string" /> <xs:element name = "city" type="xs:string" /> <xs:element name = "state" type="xs:string" /> <xs:element name = "postalCode" type="xs:short" />

<xs:element name = "order"> <xs:complexType> <xs:sequence> <xs:element ref="cd" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>

Page 91: XML Grammars

Internet Technologies 91

<xs:element name="cd">

<xs:complexType>

<xs:attribute name="artist"

type="xs:string" />

<xs:attribute name="title" type="xs:string" />

</xs:complexType>

</xs:element>

</xs:schema>

Page 92: XML Grammars

Internet Technologies 92

Running Validate

D:..\examples\XSDL\testing>ant runBuildfile: build.xml

run: Running Validate.java on po.xml Received notification of a recoverable

error.org.xml.sax.SAXParseException: cvc-datatype-valid.1.2.1: '05675A' is not a valid 'integer' value.

Received notification of a recoverable

error.org.xml.sax.SAXParseException: cvc-type.3.1.3: The value '05675A' of element 'myns:postalCode' is not valid.

Valid Document is false

Page 93: XML Grammars

Internet Technologies 93

Fix the error and run again

D:\..\XSDL\testing>ant run

Buildfile: build.xml

run:

Running Validate.java on po.xml

Valid Document is true

Page 94: XML Grammars

Internet Technologies 94

Introduce a Namespace Error

<?xml version="1.0" encoding="UTF-8"?>

<!-- po.xml -->

<myns:purchaseOrder orderDate="07.23.2001"

xmlns:myns="http://www.cds-r-us.edu"

xmlns:xsi=

"http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.cds-r-us.com

po.xsd"

>

Page 95: XML Grammars

Internet Technologies 95

<myns:recipient country="USA"> <myns:name>Dennis Scannel</myns:name> <myns:street> 175 Perry Lea Side Road </myns:street> <myns:city>Waterbury</myns:city> <myns:state>VT</myns:state> <myns:postalCode>05675</myns:postalCode> </myns:recipient>

Page 96: XML Grammars

Internet Technologies 96

<myns:order>

<myns:cd artist="Brooks Williams" title="Little Lion" />

<myns:cd artist="David Wilcox" title="What you whispered" />

</myns:order>

</myns:purchaseOrder>

Page 97: XML Grammars

Internet Technologies 97

And run validate

run: Running Validate.java on po.xml

Received notification of a recoverable

error.org.xml.sax.SAXParseException: cvc-elt.1:

Cannot find the declaration of element 'myns:purchaseOrder'.

Valid Document is false

Page 98: XML Grammars

Internet Technologies 98

Code Generation

• Run JAXB against the .xsd file

• Code generated will present an API allowing us to process that style of

document

Page 99: XML Grammars

Internet Technologies 99

itemList.xsd again

<?xml version="1.0" encoding="utf-8"?><xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>

<xsd:element name="itemList"> <xsd:complexType> <xsd:sequence> <xsd:element ref="item" minOccurs="0" maxOccurs="3"/> </xsd:sequence> </xsd:complexType> </xsd:element>

Page 100: XML Grammars

Internet Technologies 100

<xsd:element name="item"> <xsd:complexType> <xsd:sequence> <xsd:element ref="name"/> <xsd:element ref="quantity"/> </xsd:sequence> </xsd:complexType></xsd:element>

<xsd:element name="name" type="xsd:string"/> <xsd:element name="quantity" type="xsd:short"/></xsd:schema>

Page 101: XML Grammars

Internet Technologies 101

Run xjc

D:..XSDL\testing>xjc itemList.xsd

D:\McCarthy\www\95-733\examples\XSDL\testing>java -jar D:\jwsdp-1.1\jaxb-1.0\lib

\jaxb-xjc.jar itemList.xsd

parsing a schema...compiling a schema...generated\impl\ItemImpl.javagenerated\impl\ItemListImpl.javagenerated\impl\ItemListTypeImpl.javagenerated\impl\ItemTypeImpl.javagenerated\impl\NameImpl.java

Page 102: XML Grammars

Internet Technologies 102

generated\impl\QuantityImpl.javagenerated\Item.javagenerated\ItemList.javagenerated\ItemListType.javagenerated\ItemType.javagenerated\Name.javagenerated\ObjectFactory.javagenerated\Quantity.javagenerated\bgm.sergenerated\jaxb.properties

Write Java Code That uses NEW the api

Page 103: XML Grammars

Internet Technologies 103

The build script used for these examples

<?xml version="1.0"?>

<project basedir="." default="compile"> <path id="classpath"> <fileset dir="D:/jwsdp-1.1/saaj-1.1.1/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jaxb-1.0/lib" includes="*.jar"/> <fileset dir="d:/jwsdp-1.1/common/lib" includes="*.jar"/>

Page 104: XML Grammars

Internet Technologies 104

<fileset dir="D:/jwsdp-1.1/jaxm-1.1.1/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/bin" includes="*.jar" /> <fileset dir="D:/jwsdp-1.1/jaxp-1.2.2/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jaxp-1.2.2/lib/endorsed" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jwsdp-shared/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jaxr-1.0_03/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jakarta-ant-1.5.1/lib" includes="*.jar"/> <fileset dir="D:/j2sdk1.4.1_01/lib" includes="*.jar"/>

<pathelement location="."/> </path>

Page 105: XML Grammars

Internet Technologies 105

<!-- compile Java source files --> <target name="compile"> <!-- compile all of the java sources --> <echo message="Compiling the java source files..."/> <javac srcdir="." destdir="." debug="on"> <classpath refid="classpath" /> </javac> </target>

<target name="run"> <echo message="Running Validate.java on po.xml"/> <java classname="Validate" fork="fasle"> <arg value="po.xml"/> <classpath refid="classpath" /> </java> </target></project>