52
Programming with XML Written by: Adam Carmi Zvika Gutterman

Programming with XML

  • Upload
    moe

  • View
    35

  • Download
    1

Embed Size (px)

DESCRIPTION

Programming with XML. Written by: Adam Carmi Zvika Gutterman. Agenda. About XML Review of XML syntax Document Object Model (DOM) JAXP W3C XML Schema Validating Parsers. About XML. XML – E X tensible M arkup L anguage Designed to describe data - PowerPoint PPT Presentation

Citation preview

Programming with XML

Written by:

Adam Carmi

Zvika Gutterman

Agenda

• About XML

• Review of XML syntax

• Document Object Model (DOM)

• JAXP

• W3C XML Schema

• Validating Parsers

About XML

• XML – EXtensible Markup Language• Designed to describe data

– Provides semantic and structural information

– Extensible

• Human readable and computer-manipulable• Software and Hardware independent• Open and Standardized by W3C1

• Ideal for data exchange

1) World Wide Web Consortium (founded in 1994 by Tim Berners-Lee)

Comment

<offenders>

<!-- Lists all traffic offenders -->

<offender id="024378449 ">

<firstName> David </firstName>

<middleName>Reuven</middleName>

<lastName>Harel</lastName>

<violation id=’12’>

<code num=“232” category=“traffic”/>

<issueDate>2001-11-02</issueDate>

<issueTime>10:32:00</issueTime>

Ran a red light at Arik &amp; Bentz st.

</violation>

</offender>

</offenders>

offenders.xmlInformation is marked up with structural and semantic information. The characters &, <, >, ‘, “ are reserved and can’t be used in character data. Use &amp;, &lt;, &gt;, &apos; and &quot; instead.

Tag

CharacterData

CharacterData

<offenders>

<!-- Lists all traffic offenders -->

<offender id="024378449 ">

<firstName> David </firstName>

<middleName>Reuven</middleName>

<lastName>Harel</lastName>

<violation id=’12’>

<code num=“232” category=“traffic”/>

<issueDate>2001-11-02</issueDate>

<issueTime>10:32:00</issueTime>

Ran a red light at Arik &amp; Bentz st.

</violation>

</offender>

</offenders>

offenders.xml: Tags

Start Tag

End Tag

Root Tag

Shorthand for:<code num=...></code>

XML tags are not pre-defined and a are case sensitive.

An XML document may have only one root tag.

<offenders>

<!-- Lists all traffic offenders -->

<offender id="024378449 ">

<firstName> David </firstName>

<middleName>Reuven</middleName>

<lastName>Harel</lastName>

<violation id=’12’>

<code num=“232” category=“traffic”/>

<issueDate>2001-11-02</issueDate>

<issueTime>10:32:00</issueTime>

Ran a red light at Arik &amp; Bentz st.

</violation>

</offender>

</offenders>

offenders.xml: ElementsR

oot

Ele

men

t

Elements mark-up information.

Element x begins with a start-tag <x> and ends with an end-tag </x>

XML Elements must be properly nested:<x>...<y>...</y>...</x>

XML documents must contain exactly one root element.

offenders.xml: Content

<offenders> ���<!-- �Lists �all �traffic �offenders �--> ���<offender id="024378449 �"> �����<firstName> ��David �</firstName> �����<middleName>Reuven</middleName> �����<lastName>Harel</lastName> �����<violation id=’12’> �������<code num=“232” category=“traffic”/> �������<issueDate>2001-11-02</issueDate> �������<issueTime>10:32:00</issueTime> �������Ran �a �red �light �at ��Arik �&amp; �Benz st. �����</violation> ���</offender> �</offenders>

The content of an element is all the text that lies between its start and end tags.

An XML parser is required to pass all characters in a document, including whitespace characters.

� whitespace

offenders.xml: Attributes

<offenders>

<!-- Lists all traffic offenders -->

<offender id="024378449 ">

<firstName> David </firstName>

<middleName>Reuven</middleName>

<lastName>Harel</lastName>

<violation id=’12’>

<code num=“232” category=“traffic”/>

<issueDate>2001-11-02</issueDate>

<issueTime>10:32:00</issueTime>

Ran a red light at Arik &amp; Benz st.

</violation>

</offender>

</offenders>

Attributes are used to provide additional information about elements.

Element values must always be enclosed in quotes (“/‘)

DOMTM

• DOMTM – Document Object Model• A Standard hierarchy of objects, recommended by

the W3C, that corresponds to XML documents.• Each element, attribute, comment, etc., in an XML

document is represented by a Node in the DOM tree.

• The DOM API1 allows data in an XML document to be accessed and modified by manipulating the nodes in a DOM tree.

1) Application Programming Interface

offenders.xml: DOM tree:Document

:Elementoffenders

:CommentListsalltrafficoffenders

:Elementoffender

:ElementfirstName :Text

David

:Attributeid

:Text

:Text

:Text

:Text

:Text024378449

Example: offenders DOM

:Elementviolation

:Attributeid

:Text

:Elementcode

:Attributenum

:Attributecategory

:Text

:ElementissueDate :Text

2001-11-02

offend

eroffend

ers :Text

12

:Text232

:Texttraffic

:ElementlastName :Text

Harel:Text

The element “middleName”

was skipped

Example: offenders DOM

:ElementissueTime :Text

10:32:00

:Text

:TextRanaredlightatArik&Benzst.

offend

er violation

:Text

offend

ers

:Text

DOM Class Hierarchy1

1) A partial class hierarchy is presented in this slide.

<<interface>>

Node

<<interface>>

Text

<<interface>>

Element<<interface>>

Document

<<interface>>

Comment

<<interface>>

CharacterData

<<interface>>

NodeList<<interface>>

NamedNodeMap

JAXP

• JAXP – JavaTM API for XML Processing• JAXP enables applications to parse and transform

XML documents using an API that is independent of a particular XML processor implementation.

• JAXP provides two parser types:– SAX1 parser: event driven– DOM document builder: constructs DOM trees

by parsing XML documents.

1) Simple API for XML

Creating a DOM Builder

1. Create a DocumentBuilderFactory object:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

2. Configure the factory object:

dbf.setIgnoringComments(true);

3. Create a builder instance using the factory:

DocumentBuilder docBuilder =dbf.newDocumentBuilder();

A ParserConfigurationException is thrown if a DocumentBuilder cannot be created which satisfies the configuration requested.

Building a DOM Document

• A DOM document can be built manually from within the application:

Document doc = docBuilder.newDocument();Element offenders = doc.createElement("offenders");doc.appendChild(offenders);Element offender = doc.createElement("offender");offender.setAttribute("id", "024378449 ");offenders.appendChild(offender);Element firstName = doc.createElement(“firstName”);Text text = doc.createTextNode(“ David “);firstName.appendChild(text);...

A DOMException is raised if an illegal character appears in a name, an illegal child is appended to a node etc.

Building a DOM Document

• A DOM representation of an XML document can be built automatically by parsing the XML document:

Document doc = docBuilder.parse(new File(xmlFile));

A SAXParseException or SAXException is raised to report parse errors.

DumpDom.java (1 of 5)

import org.w3c.dom.Document;

import org.w3c.dom.NodeList;

import org.w3c.dom.NamedNodeMap;

import org.w3c.dom.Node;

import org.xml.sax.SAXException;

import org.xml.sax.SAXParseException;

import javax.xml.parsers.DocumentBuilderFactory;

import javax.xml.parsers.DocumentBuilder;

import javax.xml.parsers.ParserConfigurationException;

import java.io.File;

import java.io.IOException;

Creating and traversing a DOM document

DumpDom.java (2 of 5)public class DumpDom { private int indent = 0; // text indentation level public DumpDom(String xmlFile) { try {

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = dbf.newDocumentBuilder(); Document doc = docBuilder.parse(new File(xmlFile)); recursiveDump(doc);

} catch (ParserConfigurationException pce) { System.err.println("Failed to create document builder"); } catch (SAXParseException spe) { System.err.println("Error: Line=" + spe.getLineNumber() + ": " +

spe.getMessage()); } catch (SAXException se) { System.err.println("Parse error found: " + se); } catch (IOException e) { e.printStackTrace(); }

}

DumpDom.java (3 of 5) private void recursiveDump(Node node) { switch (node.getNodeType()) { case Node.DOCUMENT_NODE: dumpNode("document", node); break; case Node.COMMENT_NODE: dumpNode("comment", node); break; case Node.ATTRIBUTE_NODE: dumpNode("attribute", node); break; case Node.TEXT_NODE: dumpNode("text", node); break; case Node.ELEMENT_NODE: dumpNode("element", node); indent += 2;

DumpDom.java (4 of 5) NamedNodeMap atts = node.getAttributes(); for (int i = 0 ; i < atts.getLength() ; ++i) recursiveDump(atts.item(i)); indent -= 2; break; default: System.err.println("Unknown node: " + node); System.exit(1); } // print children of the input node (if there are any) indent+=2; for (Node child = node.getFirstChild() ; child != null ; child = child.getNextSibling()) { recursiveDump(child); } indent-=2; }

DumpDom.java (5 of 5) private void dumpNode(String type, Node node) { for (int i = 0 ; i < indent ; ++i)

System.out.print(" "); System.out.print("[" + type + "]: "); System.out.print(node.getNodeName()); if (node.getNodeValue() != null) System.out.print("=\"" + node.getNodeValue() + "\""); System.out.print("\n");

} public final static void main(String[] args) { DumpDom dumper = new DumpDom(args[0]); }

}

XML Schema

• The purpose of an XML Schema is to define a class of XML documents.

• An XML document that is syntactically correct is considered well formed. If it also conforms to a XML schema is considered valid.

• A XML document is not required to have a corresponding Schema.

• XML Schemas are expected to replace the DTD1 as the primary means of describing document structure.

1) Document Type Definition (uses EBNF form)

XML Schema (cont.)

• XML Schema documents are themselves XML documents.– Can be manipulated as such– XML Schema is a language with a XML syntax.

• A XML document may explicitly reference the schema document that validates it.– A schema language is validated by a DTD.

• Several schema models exist. In this course we will use the W3C XML Schema1.

1) W3C recommendation since 2001

W3C XML Schema

<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”> ...</schema>

• A W3C XML Schema consists of a schema element and a variety of sub-elements which determine the appearance of elements and their content in instance documents

• Each of the elements (and predefined simple types) in the schema has (by convention) a prefix xsd:which is associated with the W3C XML schema namespace.

Elements & Attribute Declarations

• Elements are declared using the element element:<xsd:element name=“firstName” type=“xsd:NMTOKEN”/><xsd:element name=“offenders” type=“Offenders”/>

• Attributes are declared using the attribute element: <xsd:attribute name=“id” type=“xsd:positiveInteger”/>

A pre-defined (simple) type

Element & Attribute Types

• Elements that contain sub-elements or carry attributes are said to have complex types.

• Elements that contain only text (e.g. numbers, strings, dates etc.) but do contain any sub-elements are said to have simple types.

• Attributes always have simple types.• Many simple types (e.g. string, date, integer etc.)

are pre-defined.

A Few Built in Simple TypesSimple Type Examples

string any textual value (white space preserved)

NMTOKEN1 student, 342, $$

ID1 s1, :myId, _4

integer -126789, -1, 0, 1, 126789, 03485

float -INF, -1E4, -0, 0,12.78, 12.78E-2, NaN

time 13:24:12, 02:15:34.879

date 2002-11-23

boolean true, false, 0, 1

1) Should only be used as attribute types

Derived Simple Types

• New simple types may be defined by deriving them from existing simple types (build-in and derived)

• New simple types are derived by restricting the range of permitted values for an existing simple type.

• A new simple type is defined using the simpleType element.

Derived Simple Types (cont.)

• Example: Numeric Restriction <xsd:simpleType name="ViolationID"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="100"/> </xsd:restriction></xsd:simpleType>

• Example: Enumeration <xsd:simpleType name="ViolationCategory"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="traffic"/> <xsd:enumeration value="criminal"/> <xsd:enumeration value="civil"/> </xsd:restriction></xsd:simpleType>

Complex Types

• Complex types are defined using the complexType element.

• Elements with complex types may carry attributes.• The content of elements with complex types is

categorized as follows:– Empty: no content is allowed.

– Simple: content must be of simple type.

– Element: content must include only child elements.

– Mixed: both element and character content is allowed.

Complex Types: Attributes

• Attributes may be declared, using the use attribute, as required, optional (default) or prohibited.

• Default values for attributes are declared using the default attribute– Allowed only for optional attributes

• The fixed attribute is used to ensure that an attribute is set to a particular value.– Appearance of the attribute is optional.

– fixed and use are mutually exclusive.

Complex Types: Attributes (cont.)

• Example: use, fixed

<xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID“ use="required"/> <xsd:attribute name="category" type="ViolationCategory“ fixed="traffic"/></xsd:complexType>

• Example: use, default

<xsd:complexType name="IssueTime"> ...

<xsd:attribute name="accuracy" type="Accuracy" use="optional" default="accurate"/>

...</xsd:complexType>

Complex Types: Empty Content

• Example: schema <xsd:complexType name="Code">

<xsd:attribute name="num" type="ViolationID" use="required"/>

<xsd:attribute name="category" type="ViolationCategory“

fixed="traffic"/>

</xsd:complexType>

• Example: instance document<code num="232" category="traffic"/>

<code num="232" category="traffic"></code>

<code num="232"/>

Complex Types: Simple Content

• Example: element with no attributes

<xsd:element name="firstName" type="xsd:NMTOKEN"/>

• Example: element with attributes

<xsd:complexType name="IssueTime">

<xsd:simpleContent>

<xsd:extension base="xsd:time">

<xsd:attribute name="accuracy" type="Accuracy" use="optional"

default="accurate"/>

</xsd:extension>

</xsd:simpleContent>

</xsd:complexType>

Simple type

Complex Types: Element Content

• Element Occurrence Constraints– The minimum number of times an element may appear

is specified by the value of the optional attribute minOccurs.

– The maximum number of times an element may appear is specified by the value of the optional attribute maxOccurs.

• The value unbounded indicates that there maximum number of occurrences is unbounded.

– The default value of minOccurs and maxOccurs is 1.

Complex Types: Element Content (cont.)

• The element sequence is used to specify a sequence of sub-elements.– Elements must appear in the same order that they are declared.

<xsd:complexType> <xsd:sequence>

<xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“

minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation“

minOccurs="0" maxOccurs="unbounded"/> ... </xsd:sequence> ... </xsd:complexType>

Complex Types: Mixed Content

• The optional Boolean attribute mixed is used to specify mixed content:

<xsd:complexType name="Violation" mixed="true">

<xsd:sequence>

<xsd:element name="code" type="Code"/>

<xsd:element name="issueDate" type="xsd:date"/>

<xsd:element name="issueTime" type="IssueTime"/>

</xsd:sequence>

...

</xsd:complexType>

Global Elements/Attributes• Global elements and global attributes are created

by declarations that appear as the children of the schema element.

• A global element is allowed to appear as the root element of an instance document.

• The attribute ref of element/attribute elements may be used (instead of the name attribute) to reference a global element/attribute.

• Cardinality constraints cannot be placed on global declarations, although they can be placed on local declarations that reference global declarations.

Global Elements/Attributes (cont.)

• Example: global declarations

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<xsd:element name="offenders" type="Offenders"/>

<xsd:element name="comment" type="xsd:string"/>

<xsd:attribute name="id" type="xsd:positiveInteger"/>

...

• Example: ref attribute

<xsd:element ref="comment" minOccurs="0"/>

<xsd:attribute ref="id" use="required"/>

Anonymous Type Definitions

• When a type is referenced only once, or contains very few constraints, it can be more succinctly defined as an anonymous type.

• Saves the overhead of naming the type and explicitly referencing it.

Anonymous Type Definitions (cont.)

<xsd:element name="offender" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence>

<xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“

minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation“

minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/>

</xsd:complexType></xsd:element>

Is this a global declaration?Anonymous

offenders.xsd (1 of 4)<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<xsd:element name="offenders" type="Offenders"/> <xsd:element name="comment" type="xsd:string"/> <xsd:attribute name="id" type="xsd:positiveInteger"/>

<xsd:complexType name="IssueTime"> <xsd:simpleContent> <xsd:extension base="xsd:time"> <xsd:attribute name="accuracy" type="Accuracy" use="optional"

default="accurate"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType>

<xsd:complexType name="Code"> <xsd:attribute name="num" type="ViolationID" use="required"/> <xsd:attribute name="category" type="ViolationCategory" fixed="traffic"/> </xsd:complexType>

Schema for offenders

XML documents

offenders.xsd (2 of 4) <xsd:complexType name="Offenders"> <xsd:sequence> <xsd:element name="offender" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="firstName" type="xsd:NMTOKEN"/> <xsd:element name="middleName" type="xsd:NMTOKEN“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:NMTOKEN"/> <xsd:element name="violation" type="Violation" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType>

offenders.xsd (3 of 4) <xsd:complexType name="Violation" mixed="true"> <xsd:sequence> <xsd:element name="code" type="Code"/> <xsd:element name="issueDate" type="xsd:date"/> <xsd:element name="issueTime" type="IssueTime"/> </xsd:sequence> <xsd:attribute ref="id" use="required"/> </xsd:complexType>

<xsd:simpleType name="ViolationID"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="100"/> </xsd:restriction> </xsd:simpleType>

offenders.xsd (4 of 4) <xsd:simpleType name="ViolationCategory"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="traffic"/> <xsd:enumeration value="criminal"/> <xsd:enumeration value="civil"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="Accuracy"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="accurate"/> <xsd:enumeration value="approx"/> </xsd:restriction> </xsd:simpleType>

</xsd:schema>

Validating Parsers

• A validating parser is capable of reading a Schema specification or DTD and determine whether or not XML documents conform to it.

• A non validating parser is capable of reading a Schema / DTD but cannot check XML documents for conformity.– Limited to syntax checking

Creating a Validating DOM Parser

1. Create a DocumentBuilderFactory object:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

2. Configure the factory object to produce a validating parser:

dbf.setAttribute("http://java.sun.com/xml/jaxp/properties" + "/schemaLanguage", "http://www.w3.org/2001/XMLSchema");dbf.setAttribute("http://java.sun.com/xml/jaxp/properties" + "/schemaSource", new File(xmlSchema));dbf.setValidating(true);

3. Create a builder instance and set its error-handler:

DocumentBuilder docBuilder = dbf.newDocumentBuilder();docBuilder.setErrorHandler(new MyErrorHandler());

Handling Parsing Errors

• By default, JAXP parsers do not throw exceptions when documents are found to be invalid.

• JAXP provides the interface ErrorHandler so that users will be able to implement their own error-handling semantics.

BoundedErrorPrinter.java (1 of 3)

import org.xml.sax.ErrorHandler;import org.xml.sax.SAXException;import org.xml.sax.SAXParseException;

/** * An error handler that prints to the standard error stream a specified * number of errors. Once the specified number of errors is detected, * parsing is aborted. */public class BoundedErrorPrinter implements ErrorHandler { private int errorCount = 0; private int errorsToPrint; public BoundedErrorPrinter(int errorsToPrint) {

this.errorsToPrint = errorsToPrint; }

public void warning(SAXParseException spe) throws SAXException

{

System.err.println("Warning: " + getParseExceptionInfo(spe));

}

public void error(SAXParseException spe) throws SAXException

{

if (errorCount < errorsToPrint) {

System.err.println("Error: " + getParseExceptionInfo(spe));

++errorCount;

}

if (errorCount >= errorsToPrint)

throw spe; // abort parsing

}

BoundedErrorPrinter.java (2 of 3)

public void fatalError(SAXParseException spe) throws SAXException { if (errorCount < errorsToPrint)

System.err.println("Fatal: " + getParseExceptionInfo(spe)); throw spe; }

public boolean errorsFound() {

return errorCount > 0; }

private String getParseExceptionInfo(SAXParseException spe) { return "Line = " + spe.getLineNumber() + ": " + spe.getMessage(); }}

BoundedErrorPrinter.java (3 of 3)