22
DOM and SAX Jussi Pohjolainen TAMK University of Applied Sciences

DOM and SAX

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: DOM and SAX

DOM and SAX

Jussi PohjolainenTAMK University of Applied Sciences

Page 2: DOM and SAX

DOM and SAX

• DOM and SAX– Platform and language-independent APIs for

manipulating or reading XML-documents• API: Application Programming Interface, set of

functions, procedures, methods, classes and interfaces

• DOM and SAX is implemented in most programming languages: Java, PHP..

Page 3: DOM and SAX

Differences between DOM and SAX

DOM SAX

Standardization W3C Recommendation No formal specification

Manipulation Reading and Writing (manipulation)

Only Reading

Memory Consumption Depends on the size of the source xml-file, can be large

Very low

XML handling Tree-based Event-based

Page 4: DOM and SAX

SAX

Page 5: DOM and SAX

Overview of SAX

• SAX: Simple API for XML• Originally a Java – only API– Nowdays SAX is supported in almost all

programming languages

• Uses a event-driven model• Quantity of memory usage is low• Only for reading xml-documents

Page 6: DOM and SAX

Event-driven?

• SAX uses event-driven model for reading xml-documents

• The basic idea is, that SAX parser reads the xml-document "one line at a time".

• Handler functions reacts when finding elements and other parts of the xml-document.– When the parser finds starting tag, then a certain

function is called.. when the parser winds ending tag a certain function is called

Page 7: DOM and SAX

Example (Wikipedia)<?xml version="1.0" encoding="UTF-8"?>

<RootElement param="value">

<FirstElement>

Some Text

</FirstElement>

<SecondElement param2="something">

Pre-Text <Inline>Inlined text</Inline> Post-text.

</SecondElement>

</RootElement>

Page 8: DOM and SAX

Example (Wikipedia)

• XML Processing Instruction, named xml, with attributes version equal to "1.0" and encoding equal to "UTF-8"

• XML Element start, named RootElement, with an attribute param equal to "value"

• XML Element start, named FirstElement• XML Text node, with data equal to "Some Text" (note:

text processing, with regard to spaces, can be changed)

• XML Element end, named FirstElement• ....

Page 9: DOM and SAX

PHP and SAX// Creates an XML Parser $xml_parser = xml_parser_create(); // Set up for reading xml_set_element_handler($xml_parser, "startElement", "endElement"); xml_set_character_data_handler($xml_parser, "characterData"); // Open XML file if (!($fp = fopen($file, "r"))) { die("could not open XML input"); } // Reading and Parsing xml-file while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } } xml_parser_free($xml_parser);

Page 10: DOM and SAX

PHP and SAXfunction startElement($parser, $name, $attrs) { // Do something } function endElement($parser, $name) { // Do something } function characterData($parser, $data) { echo $data; }

Page 11: DOM and SAX

Benefits of SAX

• Excellent API when just reading the contents of the XML – file

• Easy and clean API• Does not require much resources (mobile

devices!)

Page 12: DOM and SAX

DOM

Page 13: DOM and SAX

DOM

• The Document Object Model (DOM) is a platform- and language-independent standard object model for representing HTML or XML and related formats.

• W3C Recommendation• Can be used for manipulating XML –

documents• Different versions: DOM 1, DOM 2 and DOM 3

Page 14: DOM and SAX

Basic Idea behind DOM

• API for manipulating XML – documents• DOM loads xml-document into memory and

creates a tree-model of the xml-data.– Can consume memory, if documents are large

Page 15: DOM and SAX

Tree and Nodes

• Tree consists of nodes• Node can be– Element (Element)– Text (Text)– Attribute (Attr)– CDATA (CDATASection)– Comment (Comment)– Etc

Page 16: DOM and SAX

Nodes and Relationships

• Node has references to it's– first child (firstChild)– last child (lastChild)– next sibling (nextSibling)– previous sibling (previousSibling)– parent (parentNode

Page 17: DOM and SAX

Node's contents

• Some nodes have contents (nodeValue)– Attribute's value– Element's value (text)– Comment's value (text)– etc

Page 18: DOM and SAX

Collections

• NodeList (List of nodes)– length– item ( index )

• NamedNodeMap (List of attributes)– getNamedItem( name )– item ( index )

Page 19: DOM and SAX

Example using PHP DOM// Load the xml - document

$dom = new domDocument();

$dom->load("books.xml");

// NodeList of name-elements

$listOfNodes = $dom->getElementsByTagName("name");

// Browse all nodes

foreach($listOfNodes as $node)

{

print $node->nodeValue;

}

Page 20: DOM and SAX

Example using PHP DOM// Load xml-document$dom = new domDocument(); $dom->load("books.xml");

// Create element <book></book> $book = $dom->createElement("book");

// create element <title>some contents</title> $title = $dom->createElement("title", $_GET['title']); // <book><title>some contents</title></book> $book->appendChild($title);

// Add the book under root element of "books.xml"$dom->documentElement->appendChild($book);

// save$dom->save("books.xml");

Page 21: DOM and SAX

Removing element

$elements = $dom->getElementsByTagname("kirja"); $element = $elements->item(0); $children = $element->childNodes(); $child = $element->removeChild( $children->item(0) );

<kirjat> <kirja> <nimi>Tuntematon Sotilas</nimi> </kirja> <kirja> <nimi>Learn Java</nimi> </kirja> </kirjat>

Page 22: DOM and SAX

PHP DOM and Encoding

• PHP DOM uses utf-8 internally• Everything you put into xml-document using

PHP DOM must be converted to utf-8.• utf8_encode(..), utf8_decode(...)