37
Chapter 3 XML Processors DOM Document Object Model SAX Simple API for XML StAX Streaming API for XML TrAX Transformation API for XML JAXB Java Architecture for XML Binding

XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

Chapter 3

XML Processors

DOM Document Object Model

SAX Simple API for XML

StAX Streaming API for XML

TrAX Transformation API for XML

JAXB Java Architecture for XML Binding

Page 2: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

⚫ Tasks/Requirements

– Parse through XML document

– Resolve entities

– Check document validity

– Modify XML document

⚫ Types of XML processors:

– Tree-based (DOM)

– Event-based (SAX, StAX)

– Rule-based (TrAX)

→ all are procedural

→ see Chapter 4 for declarative XML query processing

⚫ XML processor API implementations available in different programming languages, e.g.,

– Java, Python, JavaScript, C, C++, ...

3-2Lecture "XML and Databases" - Dr. Can Türker

XML Processing

XML Processor?

XML Application

Page 3: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

Java API for XML Processing (JAXP)

⚫ Interface for plugging-in and using XML processors in Java applications

⚫ JDK (since Version 1.4) includes packages for DOM, SAX, StAX, TrAX

– org.w3c.dom DOM Level 2 interface

– org.xml.sax SAX 2.0 interface

– javax.xml.parsers parser initialization and use

– javax.xml.stream streamer initialization and use

– javax.xml.transform transformer (XSLT processor) initialization and use

1-3Lecture "XML and Databases" - Dr. Can Türker

Simple APIfor XML (SAX)

Document Object Model (DOM)

Streaming APIfor XML (StAX)

Transformation APIfor XML (TrAX)

Cursor API Iterator API XSLT

Java API for XML Processing (JAXP)

Page 4: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-4Lecture "XML and Databases" - Dr. Can Türker

Document Object Model (DOM)

⚫ Object-oriented, tree-based processing of XML documents

– XML document represented as a tree with different node types ELEMENT, ATTRIBUTE, etc.

– DOM API provides methods for traversing and manipulating the tree

⚫ Application controls program flow

– uses DOM methods to navigate through DOM tree

* figure taken from Oracle® XML Developer's Kit Programmer's Guide 11g Release 2

XML ProcessorDOM

methods

XML Application

Page 5: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-5Lecture "XML and Databases" - Dr. Can Türker

Node Type: ELEMENT Node Type:

ATTRIBUTE

Node Type: DOCUMENT

DOM Illustration

bookstore

book

title author price

<?xml version="1.0"?><bookstore>

<book price='8.99'><title> The Autobiography of Benjamin Franklin</title><author>

Benjamin Franklin</author>

</book></bookstore>

The Autobiography of Benjamin Franklin

BenjaminFranklin

8.99

NODE TYPE: TEXT

Page 6: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-6Lecture "XML and Databases" - Dr. Can Türker

Current node

DOM Navigation Methods

1

65

432

6 - getNextSibling()

4 - getChildren()

3 - getLastChild()

2 - getFirstChild()

1 - getParentNode()

5 - getPreviousSibling()

Page 7: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-7Lecture "XML and Databases" - Dr. Can Türker

DOM Manipulation Methods

A

DB

GF K

A

DB

KF G

insertBefore(K, G)

removeChild(G)

A

DB

GF

appendChild(K)

A

DB

F

replaceChild(K, G)

A

DB

F K

Page 8: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-8Lecture "XML and Databases" - Dr. Can Türker

DOM Interfaces

Comment

ProcessingInstruction

Document

DocumentFragment

DocumentType

Element

Entity

EntityReference

Notation

Text

CDataSection

DOMImplementation Node NodeList NamedNodeMap

CharacterData

Attr

Page 9: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

DOM API: Node, Document, Element, CharacterData

3-9Lecture "XML and Databases" - Dr. Can Türker

interface Node {// Some navigation methodsNode getDocumentElement(); NodeList getChildNodes(); Node getFirstChild(); Node getLastChild(); Node getParentNode(); Node getNextSibling(); Node getPreviousSibling(); short getNodeType(); DOMString getNodeValue(); // Some modification methodsNode appendChild(Node new);Node insertBefore(Node new, Node old);Node removeChild(Node old);Node replaceChild(Node new, Node old);// ... other methods not shown

}

interface Element : Node { DOMString getTagName(); Attr getAttributeNode(DOMString name);void setAttributeNode(Attr new);Attr removeAttributeNode(Attr old);NodeList getElementsByTagName(DOMString name);// ... other methods not shown

}interface CharacterData : Node {

substringData(start, count); appendData(text); replaceData(offset, count, text); insertData(offset, text); deleteData(offset, count); // ... other methods not shown

}

interface Document : Node {Element createElement(DOMString name); DocumentFragment createDocumentFragment(); Text createTextNode(DOMString data); Comment createComment(DOMString data); CDATASection createCDATASection(DOMString data); Attr createAttribute(DOMString name); EntityReference createEntityReference(DOMString name);NodeList getElementsByTagName(DOMString name); Element getElementById(DOMString elementId); // ... other methods not shown

}

Page 10: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

DOM Demo (1)

1-10Lecture "XML and Databases" - Dr. Can Türker

// DOM Demo. Expects an XML file as argument // and prints its elements.import java.io.File;import javax.xml.parsers.*; import org.w3c.dom.*;

public class DOMDemo { static public void main(String[] argv) {

// Get DOM builder factoryDocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance(); try {

// Get DOM builderDocumentBuilder builder = factory.newDocumentBuilder();

// Parse XML file into a DOM documentDocument document = builder.parse(new File(argv[0]));

// Print XML elementsprint (document);

} catch (Exception e) {e.printStackTrace();

}} // continue with the right column

// Print all elements of the given documentstatic void print(Node node) {int type = node.getNodeType();

if (type == Node.DOCUMENT_NODE) {print("Start document");

} else if (type == Node.ELEMENT_NODE) { print("Start element " + node.getNodeName());

} else if (type == Node.TEXT_NODE) {print(node.getNodeValue());

} for (Node child = node.getFirstChild();

child != null;child = child.getNextSibling()) {

print(child);}if (type == Node.DOCUMENT_NODE) {

print("End document"); } else if (type == Node.ELEMENT_NODE) {

print("End element " + node.getNodeName());}

}

static void print(String output) {System.out.println(output);

}}

Page 11: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-11Lecture "XML and Databases" - Dr. Can Türker

DOM Demo (2)

Start documentStart element poem

Start element titleRoses are RedEnd element title

Start element lRoses are red,End element l

Start element lViolets are blue;End element l

Start element lSugar is sweet,End element l

Start element lAnd I love you.End element l

End element poemEnd document

Output: java DOMDemo Roses.xml

<?xml version="1.0"?> <poem xmlns="http://www.uzh.ch/poetry"> <title>Roses are Red</title> <l>Roses are red,</l> <l>Violets are blue;</l> <l>Sugar is sweet,</l> <l>And I love you.</l> </poem>

Input XML document: Roses.xml

Page 12: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-12Lecture "XML and Databases" - Dr. Can Türker

Simple API for XML (SAX)

⚫ Event-driven, stream-based XML processing

– Sequentially reads XML event stream

– Events are occurrences of XML constructs, e.g., start or end element tags, attributes, text nodes, comments, processing instructions, ...

– Event triggers corresponding callback method of document handler

– Application can react on events by implementing the corresponding callback methods

⚫ SAX controls program flow

– no internal XML representation

* figure taken from Oracle® XML Developer's Kit Programmer's Guide 11g Release 2

XML Processorcallback methods

XML Application

Page 13: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-13Lecture "XML and Databases" - Dr. Can Türker

SAX Illustration

<?xml version="1.0"?>

<book price=‘8.99‘>

<title>

The Autobiography of Benjamin Franklin

</title>

<author>

<first-name>

Benjamin

</first-name>

<last-name>

Franklin

</last-name>

</author>

</book>

startDocument()

startElement("book", AttributeList(length=1, {name='price', type='PCDATA',value='8.99'})

startElement("title", null)

endElement("title")

startElement("author", null)

startElement("first-name", null)

endElement("first-name")

endElement("last-name")

startElement("last-name", null)

endElement("author")

endElement("book")

endDocument()

characterData("The Autobiography of Benjamin Franklin", start, length)

characterData(Benjamin", start, length)

character("Franklin", start, length)

Page 14: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-14Lecture "XML and Databases" - Dr. Can Türker

// SAX Demo. Expects an XML file.import java.io.File; import javax.xml.parsers.*;import org.xml.sax.*;

public class SAXDemo { public static void main (String args[]) {

// Get SAX Parser FactorySAXParserFactory factory = SAXParserFactory.newInstance();

factory.setNamespaceAware(true);try {

// Create SAX ParserSAXParser parser =

factory.newSAXParser();

// Parse the given XML fileparser.parse(new File(args[0]),

new MySAXHandler());} catch (Exception e) {

e.printStackTrace();}

}}// Overwrite event handler/callback methods class MySAXHandler extends DefaultHandler { public MySAXHandler() { super(); } public void startDocument() {

print("Start document"); } // continue with right column

SAX Demo

public void endDocument() { print("End document");

} public void startElement(String uri, String name,

String qName, Attributes atts) { print("Start element: " + name);

} public void endElement(String uri, String name,

String qName) { print("End element: " + name);

} public void characters(char ch[], int start, int len) {

if (start > 0 && len > 0) {print(String.valueOf(ch).substring(start,

start+len)); } else {print("");

} }

static void print(String output) {System.out.println(output);

} }

java SAXDemo Roses.xmlproduces same output as DOM Demo (2)

Page 15: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

DOM

⚫ Navigation through document tree

⚫ Context-sensitive access

⚫ Document manipulation

⚫ Can be used as storage structure

⚫ Potentially inefficient for large XML documents

3-15Lecture "XML and Databases" - Dr. Can Türker

DOM versus SAX

SAX

⚫ Simple access but no XML update

⚫ Useful for simple structured documents

⚫ Can efficiently handle very large XML documents

⚫ Access restricted to certain pieces of XML Documents

⚫ Both interfaces are standardized

⚫ Portable and platform-independent

⚫ Many implementations are available

⚫ No declarative, i.e., set-oriented access as in SQL (efforts required even for simple problems)

Page 16: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

Streaming API for XML (StAX)

⚫ Application walks through XML stream and reacts whenever needed

– Cursor/Iterator points to one thing at a time

– Always moves forward, never backward

⚫ Cursor vs. Iterator API:

– Cursor API: stream of strings

– Iterator API: stream of discrete events

⚫ Difference to SAX

– Pull instead of Push API: Application controls program flow

– XML write in addition to XML read

1-16Lecture "XML and Databases" - Dr. Can Türker

XML ProcessorStAX

methods

XML Application

Page 17: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

StAX Cursor API versus Iterator API

3-17Lecture "XML and Databases" - Dr. Can Türker

// Cursor API

interface XMLStreamReader { int next(); // set cursors to next event and returns its typeboolean hasNext();// returns true if there are more parsing eventsString getText(); // returns value of the current eventString getLocalName(); // returns name of the current event// ... other methods not shown

}

interface XMLStreamWriter { void close() // closes writer and free any resourcesvoid flush() // writes any cached data to outputvoid writeStartDocument(); // writes start document element tag into streamvoid writeStartElement(String name); // writes start element tag into streamvoid writeAttribute(String name, String value);// writes attribute into streamvoid writeCharacters(String text); // writes text data into stream// ... other methods not shown

}

// Iterator API

interface XMLEventReader extends Iterator {String getElementText();// returns content of a text node Object next(); // returns next event in the stream XMLEvent nextEvent(); // returns next typed XMLEventboolean hasNext(); // returns true if there are more events // to process in the streampublic XMLEvent peek();// returns event but does not iterate // to the next event// ... other methods not shown

}

interface XMLEventWriter { void add(XMLEvent); // adds an event to the output streamvoid add(XMLEventReader); // adds an entire stream to an output streamvoid close(); // closes writer and frees any resourcesvoid flush(); // flushes any cached events to output

// ... other methods not shown }

Page 18: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

StAX Demo : ReadXML (Cursor API)

3-18Lecture "XML and Databases" - Dr. Can Türker

// StAX Cursor API Demo. Expects an XML// file as argument. Prints all elements. import java.io.FileInputStream; import javax.xml.stream.XMLInputFactory;import javax.xml.stream.XMLStreamConstants;import javax.xml.stream.XMLStreamReader;

public class StAXDemoCursor { public static void main(String[] args) {

// Get XML input factoryXMLInputFactory factory = XMLInputFactory.newInstance();

try {// Create XML stream readerXMLStreamReader reader =

factory.createXMLStreamReader(new FileInputStream(args[0]));

// Cursor to iterate over XML streamwhile(reader.hasNext()) {

// Pull next event from the cursorint eventType = reader.next();// Print eventprintEvent(eventType, reader);

} } catch (Exception e) {e.printStackTrace();

}} // continue with right column

static void printEvent(int eventType, XMLStreamReader reader) {

switch(eventType) {case XMLStreamConstants.START_DOCUMENT:print("Start document"); break;

case XMLStreamConstants.START_ELEMENT:print("Start element " +

reader.getLocalName()); break;

case XMLStreamConstants.CHARACTERS:print(reader.getText()); break;

case XMLStreamConstants.END_ELEMENT:print("End element " + reader.getLocalName()); break;

case XMLStreamConstants.END_DOCUMENT:print("End document"); break;

}}

static void print(String output) {System.out.println(output);

}}

java StAXDemoCursor Roses.xmlproduces same output as DOM Demo (2)

Page 19: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

StAX Demo : ReadXML (Iterator API)

3-19Lecture "XML and Databases" - Dr. Can Türker

// StAX Iterator API Demo. Expects an XML// file as argument. Prints all elements. import java.io.FileInputStream; import javax.xml.stream.XMLInputFactory;import javax.xml.stream.XMLEventReader;import javax.xml.stream.events.XMLEvent;

public class StAXDemoIterator { public static void main(String[] args) {

// Get XML input factoryXMLInputFactory factory = XMLInputFactory.newInstance();

try {// Create XML event readerXMLEventReader reader =

factory.createXMLEventReader(new FileInputStream(args[0]));

// Iterate over XML events while(reader.hasNext()) {

// Get next event from the iteratorXMLEvent event = reader.nextEvent();// Print eventprintEvent(event);

} } catch (Exception e) {e.printStackTrace();

}} // continue with right column

static void printEvent(XMLEvent event) {if (event.isStartDocument()) {print("Start document");

} else if (event.isStartElement()) {print("Start element " + event.

asStartElement().getName().getLocalPart());} else if (event.isCharacters()) {print(event.asCharacters().getData());

} else if (event.isEndElement()) {print("End element " + event.

asEndElement().getName().getLocalPart());} else if (event.isEndDocument()) {print("End document");

}}

static void print(String output) {System.out.println(output);

}}

java StAXDemoIterator Roses.xmlproduces same output as DOM Demo (2)

Page 20: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

public static void writeXML(XMLStreamWriter writer) {try {

writer.writeStartDocument();writer.writeStartElement("lecture");writer.writeAttribute("semester", "FS2016");writer.writeStartElement("title");writer.writeAttribute("lecturer", "Dr. Can Türker");writer.writeCharacters("XML and Databases");writer.writeEndElement();writer.writeEndElement();writer.writeEndDocument();

} catch (Exception e) {e.printStackTrace();

}}

}

StAX Demo : WriteXML (Cursor API)

3-20Lecture "XML and Databases" - Dr. Can Türker

// StAX Cursor API Demo. Expects a name // for an XML file that is created.import java.io.FileOutputStream; import javax.xml.stream.XMLOutputFactory;import javax.xml.stream.XMLStreamWriter;

public class StAXDemoCursorWrite { public static void main(String[] args) {

// Get XML output factoryXMLOutputFactory factory = XMLOutputFactory.newInstance();

try {// Create XML stream writerXMLStreamWriter writer =

factory.createXMLStreamWriter(new FileOutputStream(args[0]));

// Write XML document to streamwriteXML(writer);

// Flush XML stream writerwriter.flush();

// Close XML stream writerwriter.close();

} catch (Exception e) {e.printStackTrace();

}} // continue with right column

<?xml version="1.0" ?><lecture semester="FS2016"><title lecturer="Dr. Can Türker">XML and Databases</title></lecture>

Output: java StAXDemoCursorWrite lecture.xml

Page 21: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

StAX Demo : WriteXML (Iterator API)

3-21Lecture "XML and Databases" - Dr. Can Türker

// StAX Iterator API Demo. Expects a name// for an XML file that is created.import javax.xml.stream.XMLEventFactory;import javax.xml.stream.XMLEventWriter;import javax.xml.stream.XMLOutputFactory;

public class StAXDemoIteratorWrite { public static void main(String[] args) {// Get XML output factoryXMLOutputFactory factory =

XMLOutputFactory.newInstance();

try {// Create XML event writerXMLEventWriter writer =

factory.createXMLEventWriter(new FileOutputStream(args[0]));

// Write XML document to streamwriteXML(writer);

// Flush XML event writerwriter.flush();

// Close XML event writerwriter.close();

} catch (Exception e) {e.printStackTrace();

}} // continue with right column

public static void writeXML(XMLEventWriter writer) {XMLEventFactory ef = XMLEventFactory.newInstance();try {

writer.add(ef.createStartDocument());writer.add(ef.createStartElement("", "", "lecture"));writer.add(ef.createAttribute("semester", "FS2016"));writer.add(ef.createStartElement("", "", "title"));writer.add(ef.createAttribute("lecturer",

"Dr. Can Türker"));writer.add(ef.createCharacters("XML and Databases"));writer.add(ef.createEndElement("", "", "title"));writer.add(ef.createEndElement("", "", "lecture"));writer.add(ef.createEndDocument());

} catch (Exception e) {e.printStackTrace();

}}

}

<?xml version="1.0" ?><lecture semester="FS2016"><title lecturer="Dr. Can Türker">XML and Databases</title></lecture>

Output: java StAXDemoIteratorWrite lecture.xml

Page 22: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

Use Cursor API if

⚫ programming in particularly memory-constrained environments like J2ME

⚫ performance is an issue

Use Iterator API if you want to

⚫ create XML processing pipelines

⚫ modify event streams

⚫ handle pluggable stream processing by the application

Comparison StAX Cursor vs. Iterator API

⚫ Cursor API more efficient, Iterator API more flexible and extensible

⚫ Possible with Iterator API but not with Cursor API:

– Use XMLEvent instances in arrays, lists, and maps and pass them through an application even after the XML processor has moved on to subsequent events

– Create subtypes of XMLEvent that are either completely new information items or extensions of existing items but with additional methods

– Add and remove events from XML streams easily

1-22Lecture "XML and Databases" - Dr. Can Türker

Page 23: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

TrAX - Transformation API for XML

⚫ Generic transformations from one to another form, e.g. transform XML documents to

– alternative XML representations

– PDF or PostScript

– formatted HTML

⚫ Transformations may be described by XSLT Style sheets, Java code, Perl code, other types of script, or by proprietary formats

⚫ Inputs/Outputs may be URL, XML stream, DOM tree, SAX Events, or proprietary data structures

1-23Lecture "XML and Databases" - Dr. Can Türker

* figure taken from Oracle® XML Developer's Kit Programmer's Guide 11g Release 2

Page 24: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-24Lecture "XML and Databases" - Dr. Can Türker

// XSLT Demo. Expects three arguments: inputfile, stylesheetfilename, outfilename. import java.io.File; import javax.xml.parsers.*;import org.w3c.dom.Document;import javax.xml.transform.*;

public class TrAXDemo { public static void main(String[] args) {// Get XML DocumentBuilderFactoryDocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

try {// Parse input file into a documentDocumentBuilder builder = factory.newDocumentBuilder();Document document = builder.parse(new File(args[0]));DOMSource source = new DOMSource(document);

// Read style sheet file into a transformerTransformerFactory tFactory = TransformerFactory.newInstance();StreamSource styleSource = new StreamSource(new File(args[1]));Transformer transformer = tFactory.newTransformer(styleSource);

// Stream transformation to output fileStreamResult result = new StreamResult(new File(args[2]));transformer.transform(source, result);

} catch (Exception e) {e.printStackTrace();

}}

}

TrAX Demo (1)

Page 25: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

3-25Lecture "XML and Databases" - Dr. Can Türker

TrAX Demo (2)

<html><body><h1>Roses are Red</h1><li>Roses are red,</li><li>Violets are blue;</li><li>Sugar is sweet,</li><li>And I love you.</li></body></html>

Output: java TrAXDemo Roses.xml Roses.xml Roses.html

<?xml version="1.0"?> <poem xmlns="http://www.uzh.ch/poetry"> <title>Roses are Red</title> <l>Roses are red,</l> <l>Violets are blue;</l> <l>Sugar is sweet,</l> <l>And I love you.</l> </poem>

Input XML document: Roses.xml

<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="html"/> <xsl:template match="/">

<html><body><xsl:apply-templates/>

</body></html></xsl:template><xsl:template match="/*/title">

<h1> <xsl:apply-templates/> </h1></xsl:template><xsl:template match="/*/l">

<li> <xsl:apply-templates/> </li></xsl:template>

</xsl:stylesheet>

Input Style Sheet: Roses.xsl

Page 26: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

Comparison of XML Processors

Feature/Use Case DOM SAX StAX TrAX

API Type In-memory tree Push, streaming Pull, streaming XSLT Rule

Ease of Use High Medium High Medium

XPath Capability Yes No No Yes

CPU and Memory Efficiency Varies Good Good Varies

Forward Only No Yes Yes No

Read XML Yes Yes Yes Yes

Write XML Yes No Yes Yes

Domain Model (CRUD) Yes No No No

XML Pipelines No Yes (Read-only) Yes (Read/Write) No

XML-to-XML Transformations Good No Simple Best

Transform Arbitrary Structures No Yes Yes No

1-26Lecture "XML and Databases" - Dr. Can Türker

* Table adapted from http://www.developer.com/xml/article.php/10929_3397691_2/Does-StAX-Belong-in-Your-XML-Toolbox.htm

Page 27: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

Java Architecture for XML Binding (JAXB)

⚫ Java framework for standard and customized mappings between Java and XML

– Mapping between Java classes and XML schema

– Marshalling Java content trees into XML documents

– Unmarshalling XML documents into Java content trees

– Almost no XML knowledge required to programmatically process XML documents

⚫ Higher level construct than DOM

– Both create in-memory content tree

– DOM based on generic (node) tree

– JAXB content tree is specific to a source schema

◼ Tree is not created dynamically

◼ Data access with Java classes' methods

◼ Data-driven as opposed to XML document-driven

⚫ Application controls program flow

– uses JAXB methods to map to/from XML

– navigation by following Java references

1-27Lecture "XML and Databases" - Dr. Can Türker

XML BindingJAXB

methods

XML Application

* Figure taken from https://docs.oracle.com/javase/tutorial/jaxb/intro/arch.html

Page 28: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

JAXB Architectural Components

⚫ Schema compiler binds source schema to set of schema-derived program elements

– Binding described by an XML-based binding language

⚫ Schema generator maps set of existing program elements to derived schema

– Mapping described by program annotations

⚫ Binding runtime provides unmarshalling/marshalling operations for accessing, manipulating, and validating XML content using either schema-derived or existing program elements

1-28Lecture "XML and Databases" - Dr. Can Türker

* Figure taken from https://docs.oracle.com/javase/tutorial/jaxb/intro/arch.html

Page 29: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

JAXB Demo (Marshalling)

3-29Lecture "XML and Databases" - Dr. Can Türker

import javax.xml.bind.annotation.*;

// Access type FIELD maps all Java// fields/properties to XML elements@XmlRootElement@XmlAccessorType(XmlAccessType.FIELD)public class Lecture {public String title; public String lecturer; public String semester;

}

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><lecture>

<title>XML and Databases</title><lecturer>Dr. Can Türker</lecturer><semester>FS2016</semester>

</lecture>

Output: java JAXBMarshallingDemo

// JAXB Demo - Marshall Java to XMLimport java.io.File;import javax.xml.bind.*;

public class JAXBMarshallingDemo {public static void main(String[] args) throws Exception {Lecture lecture = new Lecture();lecture.title = "XML and Databases";lecture.lecturer = "Dr. Can Türker";lecture.semester = "FS2016";

File file = new File("lecture.xml");JAXBContext jc = JAXBContext.newInstance(Lecture.class);Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT,

Boolean.TRUE);marshaller.marshal(lecture, file);

}}

Page 30: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

JAXB Demo (Ummarshalling)

3-30Lecture "XML and Databases" - Dr. Can Türker

Lecture XML and Databases is held by Dr. Can Türker

Output: java DemoUnmarshalling

// JAXB Demo - Unmarshall XML to Javaimport java.io.File;import javax.xml.bind.*;

public class JAXBUnmarshallingDemo{public static void main(String[] args) throws Exception {File file = new File("lecture.xml");JAXBContext jc = JAXBContext.newInstance(Lecture.class);Unmarshaller unmarshaller = jc.createUnmarshaller();Lecture lecture = (Lecture) unmarshaller.unmarshal(file);System.out.println("Lecture " + lecture.title

+ " is held by " + lecture.lecturer); }

}

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><lecture>

<title>XML and Databases</title><lecturer>Dr. Can Türker</lecturer><semester>FS2016</semester>

</lecture>

Input: lecture.xmlimport javax.xml.bind.annotation.*;

// Access type FIELD maps all Java// fields/properties to XML elements@XmlRootElement@XmlAccessorType(XmlAccessType.FIELD)public class Lecture {public String title; public String lecturer; public String semester;

}

Page 31: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

JAXB Demo 2 (Marshalling)

3-31Lecture "XML and Databases" - Dr. Can Türker

import javax.xml.bind.annotation.*;

// Access type NONE is for custom// mapping of selected Java// fields/properties to XML@XmlRootElement@XmlAccessorType(XmlAccessType.NONE)public class Lecture {

@XmlElementpublic String title;

@XmlElement(name="teacher")public String lecturer;

@XmlAttributepublic String semester;

}

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><lecture semester="FS2016">

<title>XML and Databases</title><teacher>Dr. Can Türker</teacher>

</lecture>

Output: java JAXBMarshallingDemo

// JAXB Demo - Marshall Java to XMLimport java.io.File;import javax.xml.bind.*;

public class JAXBMarshallingDemo {public static void main(String[] args) throws Exception {Lecture lecture = new Lecture();lecture.title = "XML and Databases";lecture.lecturer = "Dr. Can Türker";lecture.semester = "FS2016";

File file = new File("lecture.xml");JAXBContext jc = JAXBContext.newInstance(Lecture.class);Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT,

Boolean.TRUE);marshaller.marshal(lecture, file);

}}

Page 32: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

JAXB Demo 2 (Ummarshalling)

3-32Lecture "XML and Databases" - Dr. Can Türker

import javax.xml.bind.annotation.*;

// Access type NONE is for custom// mapping of selected Java// fields/properties to XML@XmlRootElement@XmlAccessorType(XmlAccessType.NONE)public class Lecture {

@XmlElementpublic String title;

@XmlElement(name="teacher")public String lecturer;

@XmlAttributepublic String semester;

}

Lecture XML and Databases is held by Dr. Can Türker

Output: java DemoUnmarshalling

// JAXB Demo - Unmarshall XML to Javaimport java.io.File;import javax.xml.bind.*;

public class JAXBUnmarshallingDemo{public static void main(String[] args) throws Exception {File file = new File("lecture.xml");JAXBContext jc = JAXBContext.newInstance(Lecture.class);Unmarshaller unmarshaller = jc.createUnmarshaller();Lecture lecture = (Lecture) unmarshaller.unmarshal(file);System.out.println("Lecture " + lecture.title

+ " is held by " + lecture.lecturer); }

}

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><lecture semester="FS2016">

<title>XML and Databases</title><teacher>Dr. Can Türker</teacher>

</lecture>

Input: lecturer.xml

Page 33: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

JAXB Annotations for Java-XML MappingAnnotation Description

@XmlSchema Maps a package to an XML target namespace

@XmlAccessorType Controls default serialization of fields and properties

@XmlAccessorOrder Controls the default ordering of properties and fields mapped to XML elements

@XmlSchemaType Allows a customized mapping to an XML Schema built-in type

@XmlSchemaTypes A container annotation for defining multiple @XmlSchemaType annotations

@XmlType Maps a Java class to a schema type

@XmlRootElement Associates a global element with the schema type to which the class is mapped

@XmlEnum Maps a Java type to an XML simple type

@XmlEnumValue Maps a Java type to an XML simple type

@XmlType Maps a Java class to a schema type

@XmlRootElement Associates a global element with the schema type to which the class is mapped

@XmlElement Maps a JavaBeans property or field to an XML element derived from a property or field name

@XmlElements A container annotation for defining multiple @XmlElement annotations

@XmlElementRef Maps a JavaBeans property or field to an XML element derived from a property or field’s type

@XmlElementRefs A container annotation for defining multiple @XmlElementRef annotations

@XmlElementWrapper Creates a wrapper element around an XML representation (typically around collections)

@XmlAnyElement Maps a JavaBeans property to an XML infoset representation or a JAXB element

@XmlAttribute Maps a JavaBeans property to an XML attribute

@XmlAnyAttribute Maps a JavaBeans property to a map of wildcard attributes

@XmlTransient Prevents the mapping of a JavaBeans property to an XML representation

@XmlValue Maps a class to an XML Schema complex type with a simpleContent or an XML Schema simple type

@XmlID Maps a JavaBeans property to an XML ID

@XmlIDREF Maps a JavaBeans property to an XML IDREF

@XmlList Maps a property to a list simple type

@XmlMixed Marks a JavaBeans multi-valued property to support mixed content

@XmlMimeType Associates the MIME type that controls the XML representation of the property

@XmlAttachmentRef Marks a field/property that its XML form is a URI reference to mime content

@XmlInlineBinaryData Disables consideration of XOP encoding for data types that are bound to base64-encoded binary data in XML

1-33Lecture "XML and Databases" - Dr. Can Türker

Page 34: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

JAXB Annotations for Java-XML Mapping

XmlAccessType Description

FIELD Every non-static, non-transient field in a JAXB-bound class will be bound to XML, unless annotated by XmlTransient

NONE None of the fields/properties is bound to XML unless they are annotated with some of the JAXB annotations

PROPERTY Every getter/setter pair in a JAXB-bound class will be bound to XML, unless annotated by XmlTransient

PUBLIC_MEMBER Every public getter/setter pair and every public field will be bound to XML, unless annotated by XmlTransient

1-34Lecture "XML and Databases" - Dr. Can Türker

XmlAccessOrder Description

ALPHABETICAL Alphabetical ordering of fields/properties in a class

UNDEFINED Ordering of fields/properties in a class is undefined

Page 35: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

JAXB Mapping between Java and XML

XML Schema Type Java Data Type

xsd:string java.lang.String

xsd:integer java.math.BigInteger

xsd:int int

xsd.long long

xsd:short short

xsd:decimal java.math.BigDecimal

xsd:float float

xsd:double double

xsd:boolean boolean

xsd:byte byte

xsd:QName javax.xml.namespace.QName

xsd:dateTime javax.xml.datatype.XMLGregorianCalendar

xsd:base64Binary byte[]

xsd:hexBinary byte[]

xsd:unsignedInt long

xsd:unsignedShort int

xsd:unsignedByte short

xsd:time javax.xml.datatype.XMLGregorianCalendar

xsd:date javax.xml.datatype.XMLGregorianCalendar

xsd:g javax.xml.datatype.XMLGregorianCalendar

xsd:anySimpleType java.lang.Object

xsd:anySimpleType java.lang.String

xsd:duration javax.xml.datatype.Duration

xsd:NOTATION javax.xml.namespace.QName

1-35Lecture "XML and Databases" - Dr. Can Türker

Java Class XML Schema Type

java.lang.String xs:string

java.math.BigInteger xs:integer

java.math.BigDecimal xs:decimal

java.util.Calendar xs:dateTime

java.util.Date xs:dateTime

javax.xml.namespace.QName xs:QName

java.net.URI xs:string

javax.xml.datatype.XMLGregorianCalendar xs:anySimpleType

javax.xml.datatype.Duration xs:duration

java.lang.Object xs:anyType

java.awt.Image xs:base64Binary

javax.activation.DataHandler xs:base64Binary

javax.xml.transform.Source xs:base64Binary

java.util.UUID xs:string

Page 36: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

Other Java APIs for XML

⚫ JDOM

– http://www.jdom.org

– Variant of W3C DOM; closer to Java object-orientation

⚫ DOM4J

– http://www.dom4j.org

– roughly similar to JDOM

⚫ JAX-WS (Java API for XML Web Services)

– https://jax-ws.java.net

– Java to WSDL mapping

⚫ XSL-FO (XSL Formatting Objects)

– XSL variant most often used to generate PDFs

1-36Lecture "XML and Databases" - Dr. Can Türker

Page 37: XML und Datenbanken - UZH9542471b-a852-4ae2-9489-f... · 2019. 2. 17. · Java API for XML Processing (JAXP) ⚫ Interface for plugging-in and using XML processors in Java applications

Conclusions

⚫ We know how XML documents can be processed in a procedural way!

– DOM (Document Object Model)

◼ Tree-based XML read/write using DOM methods

– SAX (XML Streaming)

◼ Event-based XML read using SAX callback methods

– StAX (XML Streaming)

◼ Event-based XML read/write using StAX methods

– TraX (XML Transformation)

◼ Arbitrary transformations using XSLT

– JAXB (Java XML Binding)

◼ Customizable Java-XML mappings using JAXB annotations

⚫ We now want to know how XML documents can be processed declaratively

– XML query languages

2-37Lecture "XML and Databases" - Dr. Can Türker