22
Apache DOM Parser ©zwz October 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky

Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky

Embed Size (px)

Citation preview

Apache DOM Parser©zwzOctober 24, 2002

Wenzhong Zhao

Department of Computer Science

The University of Kentucky

October 24, 2002 ©zwz Apache DOM Parser

Overview

org.w3c.dom– Node

• Document

• Element

• Attr

• Text

– NodeList

– NamedNodeMap

org.apache.xerces.parsers– DOMParser

Sample Code Segment Xerces Parser Info

October 24, 2002 ©zwz Apache DOM Parser

Overview

<root>

<?xml version=“1.0”><root> <a id =“1”>This</a> is <b/> <c> a test</c> </root>

<a> <b> <c>“is”

“This” “a test”

id=“1”

October 24, 2002 ©zwz Apache DOM Parser

Overview

<root>

<a> <b> <c>“is”

“This” “a test”

id=“1”

Document

October 24, 2002 ©zwz Apache DOM Parser

Overview

<root>

<a> <b> <c>“is”

“This” “a test”

id=“1”

Document

Element

October 24, 2002 ©zwz Apache DOM Parser

Overview

<root>

<a> <b> <c>“is”

“This” “a test”

id=“1”

Document

Element

Attr

Text

October 24, 2002 ©zwz Apache DOM Parser

DOM Interface

Represents XML documents as objects

– Extract information

– Manipulate XML documents

A set of Interfaces

– Defined by w3c

– Platform and language-neutral

– Must be implemented by classes which contain actual data

– The DOM parser: responsible for providing those implementation classes

October 24, 2002 ©zwz Apache DOM Parser

DOM Interface Hierarchy

Node– Attr– CharacterData

• Comment• Text

– CDATASection

– Document– DocumentFragment– DocumentType– Element– Entity– EntityReference– Notation– ProcessingInstruction

DOMImplementation NamedNodeMap NodeList

October 24, 2002 ©zwz Apache DOM Parser

Interface Node

Represents a single node The primary datatype for the entire DOM Methods defined here are available for all sub-interfaces Methods

– Public NodeList getChildNodes()

– Public Node getFirstChild()

– Public Node getNextSibling()

– Public Node getParentNode()

– Public NamedNodeMap getAttributes()

October 24, 2002 ©zwz Apache DOM Parser

Interface Node (cont’)

– Public String getNodeName()

– Public String getNodeValue()

– Public short getNodeType()

– Public Node appendChild(Node newChild)

• throws DOMException

– Public Node removeChild(Node oldChild)

• throws DOMException

October 24, 2002 ©zwz Apache DOM Parser

Interface Document

Represents an entire XML document Methods

– Public Element getDocumentElement()• Returns the root element of the document

– Public NodeList getElementsByTagName(String tag)• Returns a list of nodes with the specified tag name

• In the order of a preorder traversal of the document

October 24, 2002 ©zwz Apache DOM Parser

Interface Element

Represents an element in a document Methods

– Public NodeList getElementsByTagName(String tag)• Returns a list of nodes with the specified tag name

• In the order of a preorder traversal of the document

– Public String getTagName()

– Public Attr getAttributeNode()

October 24, 2002 ©zwz Apache DOM Parser

Interface Attr

Represents an attribute in an Element object Methods

– Public String getName()

– Public String getValue()

– Public void setValue(String value)

• throws DOMException

October 24, 2002 ©zwz Apache DOM Parser

Interface Text

Represents the textual content of an Element or Attr Methods

– Public String getData()

• throws DOMException

– Public void setData(String data)

• throws DOMException

– Public int getLength()

• Returns the size of the Text node

October 24, 2002 ©zwz Apache DOM Parser

Interface NodeList

Represents a collection of nodes– an ordered collection

Methods– Public Node item(int index)

• Returns the node with the specified index

• Note: index starts with 0

– Public int getLength()

• Returns the size of the node list

October 24, 2002 ©zwz Apache DOM Parser

Interface NamedNodeMap

Represents a collection of nodes – Can be accessed by name– Not maintained in any particular order

Methods– Public Node getNamedItem(String name)

• Returns the node with the specified name– Public Node item(int index)

• Returns the node with the specified ordinal index• Note: index starts with 0

– Public int getLength()• Returns the size of the node collection

– Public void setNamedItem(Node newNode)• throws DOMException

October 24, 2002 ©zwz Apache DOM Parser

Class DOMParser

A software library (or software package) – Provides clear APIs for client applications to manipulate XML

documents.

Is tree-based– Produces a w3c DOM tree in memory which is called a

Document object.

Client applications access or modify the information stored in the original XML document by– Invoking methods on the Document object

– Invoking methods on other objects it contains.

Vendors:– Apache, Oracle, IBM, Microsoft, Sun, Tibco

October 24, 2002 ©zwz Apache DOM Parser

DOMParser (cont’)

Constructor: – Public DOMParser()

• Use DTD/Schema parser configuration

Methods– Public void parse(String systemId)

• throws SAXException and IOException

• Build a DOM tree if successful

– Public void parse(InputSource is)

• throws SAXException and IOException

• Build a DOM tree if successful

– Public Document getDocument()

• Return a Document object

October 24, 2002 ©zwz Apache DOM Parser

DOMParser (cont’)

– void setFeature(String featureId, boolean state)• Throws SAXNotRecognizedException and

SAXNotSupportedException

• Set the state of the feature in the SAX2 parser

• http://apache.org/xml/features/validation/schema

– void setProperty(String propertyId, Object value)• Throws SAXNotRecognizedException and

SAXNotSupportedException

• Set the value of the property in the SAX2 parser

• http://apache.org/xml/properties/schema/external-schemaLocation

• http://dblab.csr.uky.edu/~wzhao0/schema/spo/spo.xsd

October 24, 2002 ©zwz Apache DOM Parser

DOMParser (cont’)

– Protected void setValidation(boolean validation)• Throws SAXNotRecognizedException and

SAXNotSupportedException

• Set whether the parser validates (by default, against DTD)

• Equivalent to setFeature()– Feature ID: http://xml.org/sax/features/validation

– Protected void setValidationSchema(boolean schema)• Throws SAXNotRecognizedException and

SAXNotSupportedException

• Set schema support on/off

• Equivalent to setFeature()– Feature ID: http://apache.org/xml/features/validation/schema

October 24, 2002 ©zwz Apache DOM Parser

Sample Code Segment

DOMParser parser = new DOMParser(); // Create a Xerces DOM Parserparser.setFeature(“http://xml.org/sax/features/validation”, true); // Set the feature for

validation against DTD

/* Prepare the SystemId for the XML document here … */parser.parse(SystemId); //Parse the input XML documentDocument doc = parser.getDocument(); //Obtain the Document objectElement root = doc.getDocumentElement(); //Obtain the root elementNodeList nl = doc.getElementsByTagName(“person"); //Get the Node List by nameint len = nl.getLength(); //Get the length of the NodeListfor (int i = 0; i < len; i++) {

Node node = (Node) nl.item(i); //Get the Node by indexif (node.hasChildNodes()) {

//Do something for this Node’s childrenNode firstChild = (Node) node.getFirstChild();…

}}

October 24, 2002 ©zwz Apache DOM Parser

Xerces Parser Info Apache Xerces Parser API:

http://xml.apache.org/xerces-j/apiDocs/overview-summary.html Feature IDs: http://xml.apache.org/xerces2-j/features.html Property IDs: http://xml.apache.org/xerces2-j/properties.html Location for Xerces Parser in cslab machine: /usr/local/xml-xerces2 Be sure to include the following packages in your java source:

– org.apache.xerces.parsers.DOMParser

– org.xml.sax.helpers.*

– org.xml.sax.*

– org.w3c.dom.*

– Others maybe