View
255
Download
4
Category
Preview:
Citation preview
5 - 2
Parsing XML documents Document Object Model (DOM) Simple API for XML (SAX)
Class generation
Overview
5 - 3
What's the Problem?
<?xml version="1.0"?><books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author> <publisher>Prentice Hall</publisher> <pages>655</pages> <isbn>0130811521</isbn> <price currency="USD">44.95</price>
</book> <book> <title>XML Design</title> <author>Spencer</author> <publisher>Wrox Press</publisher>
...</book>
</books>
?
Book
?
5 - 4
Parsing XML Documents
Document Tree
Parser
Docu-ment
DTD /Schema
Applicationimplements
DocumentHandler
endDocument
startDocument
endElement
endElement
startElement
startElement
DOM SAX
5 - 5
Parser
Project X (Sun Microsystems) Ælfred (Microstar Software) XML4J (IBM) Lark (Tim Bray) MSXML (Microsoft) XJ (Data Channel) Xerces (Apache) ...
5 - 6
Prescod
book
PrenticeHall
<?xml version="1.0"?><books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author> <publisher>Prentice Hall</publisher> <pages>655</pages> <isbn>0130811521</isbn> <price currency="USD">44.95</price>
</book> <book> <title>XML Design</title> <author>Spencer</author> <publisher>Wrox Press</publisher>
...</book>
</books>
The Document Object Model
XML Document Structure
The XMLHandbook Goldfarb 655
books
book
publisher pages isbnauthortitle
...
5 - 7
The Document Object Model
Provides a standard interface for access to and manipulation of XML structures.
Represents documents in the form of a hierarchy of nodes.
Is platform- and programming-language-neutral
Is a recommendation of the W3C (October 1, 1998)
Is implemented by many parsers
5 - 8
DOM - Structure Model
Document
Node
NodeList
Element
Prescod
book
PrenticeHall
The XMLHandbook Goldfarb 655
books
book
publisher pages isbnauthortitle
...
5 - 9
The Document Interface
Method Result
docTypeimplementationdocumentElementgetElementsByTagName(String)createTextNode(String)createComment(String)createElement(String)create CDATASection(String)
DocumentTypeDOMImplementationElementNodeListStringCommentElementCDATASection
5 - 10
The Node Interface
Method Result
nodeNamenodeValuenodeTypeparentNodechildNodesfirstChildlastChildpreviousSiblingnextSiblingattributesinsertBefore(Node new,Node ref)replaceChild(Node new,Node old)removeChild(Node)hasChildNode
StringStringshortNodeNodeListNodeNodeNodeNodeNodeNamedMapNodeNodeNodeBoolean
5 - 11
Node Types / Node NamesResult: NodeType /NodeName
Node Node Node Fields Type NameELEMENT_NODE 1 tagNameATTRIBUTE_NODE 2 name of attributeTEXT_NODE 3 "#text"CDATA_SECTION_NODE 4 "#cdata-section"ENTITY_REFERENCE_NODE 5 name of entity referencedENTITY_NODE 6 entity namePROCESSING_INSTRUCTION_NODE 7 targetCOMMENT_NODE 8 "#comment"DOCUMENT_NODE 9 "#document"DOCUMENT_TYPE_NODE 10 document type nameDOCUMENT_FRAGMENT_NODE 11 "#document-fragment"NOTATION_NODE 12 notation name
5 - 13
The Element Interface
Method Result
tagNamegetAttribute(String)setAttribute(String name, String value)removeAttribute(String)getAttributeNode(String)setAttributeNode(Attr)removeAttributeNode(String)getElementsByTagName
StringStringAttr
AttrAttr
NodeList
5 - 14
DOM Methods for Navigation
firstChild lastChild
nextSiblingpreviousSibling
parentNode
getElementsByTagName
childNodes(length, item())
5 - 15
DOM Methods for Manipulation
appendChildinsertBeforereplaceChildremoveChild
createElementcreateAttributecreateTextNode
5 - 16
Example
Goldfarb Spencer
books
book book
author authorauthor
Prescod
doc.documentElement.childNodes.item(0).getElementsByTagName("author"). item(1).childNodes.item(0).datadoc.documentElement.childNodes.item(0).getElementsByTagName("author"). item(1).childNodes.item(0).data
Root NodeDOM
Object TextBookssecondAuthor
TextSubnodes
firstthereof
firstBook
Authors
5 - 17
Script
<HTML><HEAD><TITLE>DOM Example</TITLE></HEAD><BODY><H1>DOM Example</H1><SCRIPT LANGUAGE="JavaScript">
var doc, root, book1, authors, author2; doc = new ActiveXObject("Microsoft.XMLDOM"); doc.async = false; doc.load("books.xml"); if (doc.parseError != 0)
alert(doc.parseError.reason); else {
root = doc.documentElement;document.write("Name of Root node: " + root.nodeName + "<BR>");document.write("Type of Root node: " + root.nodeType + "<BR>");book1 = root.childNodes.item(0);authors = book1.getElementsByTagName("author");document.write("Number of authors: " + authors.length + "<BR>");author2 = authors.item(1);document.write("Name of second author: " + author2.childNodes.item(0).data);}
</SCRIPT></BODY></HTML>
<HTML><HEAD><TITLE>DOM Example</TITLE></HEAD><BODY><H1>DOM Example</H1><SCRIPT LANGUAGE="JavaScript">
var doc, root, book1, authors, author2; doc = new ActiveXObject("Microsoft.XMLDOM"); doc.async = false; doc.load("books.xml"); if (doc.parseError != 0)
alert(doc.parseError.reason); else {
root = doc.documentElement;document.write("Name of Root node: " + root.nodeName + "<BR>");document.write("Type of Root node: " + root.nodeType + "<BR>");book1 = root.childNodes.item(0);authors = book1.getElementsByTagName("author");document.write("Number of authors: " + authors.length + "<BR>");author2 = authors.item(1);document.write("Name of second author: " + author2.childNodes.item(0).data);}
</SCRIPT></BODY></HTML>
5 - 18
SAX - Simple API for XML
Docu-ment
DTD
Application
endDocument
startDocument
endElement
endElement
startElement
startElement
Parser
5 - 19
SAX - Simple API for XML
Event-driven parsing model "Don't call the DOM, the parser calls you." Developed by the members of the XML-DEV Mailing List Released on May 11, 1998 Supported by many parsers ... ... but Ælfred is the saxon king.
5 - 20
Procedure
DOM Creating a parser instance Parsing the whole document Processing the DOM tree
SAX Creating a parser instance Registrating event handlers with the parser Parser calls the event handler during parsing
5 - 21
Namespace Support
<?xml version="1.0"?><order xmlns="http://www.net-standard.com/namespaces/order" xmlns:bk="http://www.net-standard.com/namespaces/books" xmlns:cust="http://www.net-standard.com/namespaces/customer">...<bk:book> <bk:title>XML Handbook</bk:title> <bk:isbn>0130811521</bk:isbn></bk:book>....</order>
5 - 22
Access to Qualified Elements
Node "book"
bk:book
http://www.net-standard.com/namespaces/books
bk
book
Interface "Node"
DOM Level 2
Method
nodeName
namespaceURI
prefix
localName
qName
uri
localName
SAX 2.0
startElement
5 - 23
Generation of Data Structures
DTD / Schema'yacht'
Generation
01 yacht05 name05 details10 type
Class
Processing
<?xml?><yacht yachtid='147'><name>Mona Lisa</name><image file='yacht147.jpg'/><description> Any text describing this yacht 147</description><details> <type>GULFSTAR 55</type> ength>1700</length> <width>480</width> <draft>170</draft> <sailsurface>112</sailsurface> <motor>84</motor> <headroom>202</headroom> <bunks>8</bunks></details></yacht>
01 yacht05 VENTANA05 details10 GULFSTAR 55
Object
5 - 24
Summary
To avoid expensive text processing, applications use an XML parser that creates a DOM tree of a document.
The DOM provides a standardized API to access the content of documents and to manipulate them.
Alternatively or additionally, applications can work event-based using the SAX interface, which is provided by many parsers.
Recommended