44
C# and Windows Programming XML Processing

XML Processing

Embed Size (px)

DESCRIPTION

xml ppt

Citation preview

Page 1: XML Processing

C# and Windows Programming

XML Processing

Page 2: XML Processing

2

Contents

Markup XML DTDs XML Parsers DOM

Page 3: XML Processing

3

Markup

When we write text, it is just text For example:

John Smith 123 Main St. Toronto Ontario

We can all read this and understand it A computer cannot and needs additional

information

Page 4: XML Processing

4

Markup

Markup is added to documents in the form of tags

A tag consists of text delimited by angle brackets

The name of the tag identifies it and the information which is conveyed by the tag

Page 5: XML Processing

5

Markup

Let’s add some semantic markup to our address<address>

<name>John Smith</name><street>123 Main St.</street><city>Toronto</city><province>Ontario</province>

</address> This identifies the information in the various

parts of the address

Page 6: XML Processing

6

Markup

You will notice Tags occur in pairs

A start tag A matching end tag with a “/” before the tag name

The text that the tags are describing is enclosed between the start tag and the end tag

A single tag is placed around the entire document The fact that every start tag has a matching end tag

makes the document well-formed

Page 7: XML Processing

7

XML

XML is one of in a long line of markup languages

It is the eXtensible Markup Language Unlike, other markup languages, you can

define your own tags Any meaning associated with those tags is

imposed by your program

Page 8: XML Processing

8

Uses of XML

SOAPSimple Object Access Protocol – a type of

remote procedure call Configuration files Web services Security information Electronic document exchange

Page 9: XML Processing

9

Defining Documents

If you can define your own tags, how do you know what should be in a document?Document Type Definition

This defines the allowable tags and their orderSchema

Like a DTD, it describes the tags and their order It also describes the content which can be placed

within the tags

Page 10: XML Processing

10

XML Structure

Here is a simple XML document<?xml version=“1.0” encoding=“ISO-8859-1”

standalone=“no”?><!DOCTYPE address SYSTEM “address.dtd”><address>

<name>John Smith</name><street>123 Main St.</street><city>Toronto</city><province>Ontario</province>

</address>

Page 11: XML Processing

11

Attributes

A tag can also have attributes which provide additional information about the tag<city size=“large”>Toronto</city>

A tag can have zero or more attributes

Page 12: XML Processing

12

The XML Declaration

The first line is the optional XML declaration

It consists of<?xml

Identify this as the XML declaration

version=“1.0” The version of XML in the document

Page 13: XML Processing

13

The XML Declaration

encoding=“ISO-8859-1” This is the character set used in the document Various character sets can be used including unicode (UTF-

8) an international character set

standalone = “no” Determines if the document uses any external entities which

are defined in other files This will be discussed later in the course

In general, the order of attributes is not important but it is in the XML declaration

Page 14: XML Processing

14

The DOCTYPE Declaration

The optional DOCTYPE declaration follows the XML declaration<!DOCTYPE address SYSTEM

“address.dtd”> This declaration is required only if you

want to validate the document against a definition of the tags in the document

Page 15: XML Processing

15

The Root Element

This is the <address> element which begins the document

It is the first element in the document It contains all other elements in the

document

Page 16: XML Processing

16

Elements

An element consists of a start tag, character data, and an end tag <name>John Smith</name>

A tag name must start with a letter or underscore A tag name cannot contain spaces or colons The end tag must match the start tag exactly,

including case

Page 17: XML Processing

17

Mixed Content

If an element contains just text, it has simple content<name>John Smith</name>

If it contains a mix of text and elements, it is said to have mixed content<sentence>these are nested

<adverb>correctly</adverb></sentence>

Page 18: XML Processing

18

Attributes

Attributes are name-value pairs which can be added to elements

Attributes allow you to provide additional information without changing the tag itself

The names for attributes follow the same rules as tag names

Every attribute name within the same tag must be unique

Page 19: XML Processing

19

Attributes

<employee name=“Jones”>accountant<employee><employee name=“Smith”>sales<employee>

Note that these both contain a name attribute

That is OK since the attributes are in separate elements

Attribute values are placed in either single or double quotes

Page 20: XML Processing

20

Comments

Comments are delimited by spacial brackets<!-- a comment -->Comments can

Add explanations Remove XML which is not needed for a while

Page 21: XML Processing

21

Entities

The less than and greater than signs delimit tags What if you want to type these symbols in a document

and not have them delimit a tag? Then, enter them as entities To enter a less than sign

&lt; All entities are referenced using

& The entity name ;

Page 22: XML Processing

22

Entities

Entity Symbol Description

&lt; < Less than

&gt; > Greater than

&amp; & Ampersand

&quot; “ Double quote

&apos; ‘ apostrophe

Page 23: XML Processing

23

CDATA

Sometimes using entities is not enough since you have many special characters to type

A CDATA section allows you to enter anything without having special characters interpreted<![CDATA[ any characters here ]]>

Page 24: XML Processing

24

Document Type Definitions

The DTD is one way to describe what should be in a valid XML document

There are other ways which we will examine later in the course

A DTD Describes each element and the elements which can

occur within it Describes the attributes for each element Describes entities which can be used in the document

Page 25: XML Processing

25

Person DTD

<!-- The DTD for person --><!DOCTYPE persontype [<!ELEMENT person (first, last, gender, employee-id) ><!ELEMENT first (#PCDATA) ><!ELEMENT last (#PCDATA) ><!ELEMENT gender (#PCDATA) ><!ELEMENT employee-id (#PCDATA) >

]>

Page 26: XML Processing

26

Reading the DTD

There is an element person containing the elements first last gender employee-id

These element are described below Each of these contains PCDATA, meaning parseable

character data This means that these elements only contain text – not

nested tags

Page 27: XML Processing

27

XML Parsers

There are two types of XML parsers DOM

The Document Object Model This parses the document into a tree-like structure called a

DOM The document is parsed all at once

SAX Simple Api for Xml This is a sequential parser which executes a callback when

each part of the document is recognized This is good for very large documents since the entire

document does not have to be in memory at once

Page 28: XML Processing

28

What is DOM?

DOM is an in-memory data structure It describes an XML document as a tree

structure The nodes in the tree are described by the

interface to them This means that there can be many

implementations that implement the interface

Page 29: XML Processing

29

So, how do make a document into a tree?<?xml version=“1.0”?>

<friend>

<handle degree=“close”>

Harold

</handle>

</friend>

Document

friend

whitespace handle

Harold

whitespacedegree

close

RootElement

Text

Attribute

Page 30: XML Processing

30

Nodes

All nodes in a DOM implement the Node interface

All other interfaces in the tree extend the Node interface

This means that every node can be treated as a Node, and maybe more

Page 31: XML Processing

31

XmlNode

Represents every node in the DOM Properties

ParentNodeNameFirstChildNextSiblingPreviousSiblingValue

Page 32: XML Processing

32

XmlNode

Methods InsertBefore()AppendChild()RemoveChild()Clone()

Page 33: XML Processing

33

XmlDocument

The node above the root node of the document Can be used to represent an empty document Properties

DocumentElement Methods

CreateElement() CreateTextNode() GetElementsByTagName() Load() Save()

Page 34: XML Processing

34

XmlElement

This represents an element An element can have attributes Properties

XmlAttributeCollection Attributes

Methods GetElementsByTagName() SetAttribute(string name, string value) string GetAttribute(string name)

Page 35: XML Processing

35

XmlAttribute

This is an attribute Can have either Text nodes or

EntityReferences as children Name property gets the name Value gets the value

Page 36: XML Processing

36

XmlText

This is the node representing text The text has no markup Even whitespace is represented as a text

node

Page 37: XML Processing

37

CDATASection Interface

This is a CDATA section It is similar to a text node but the content

undergoes no interpretation

Page 38: XML Processing

38

Other Node Subinterfaces

Comment Notation Entity EntityReference ProcessingInstruction

These are all just the same as in XML

Page 39: XML Processing

39

Other Node Subinterfaces

DocumentFragmentPart of a document tree which can be inserted

into another tree DOMImplementation

Prevides capabilities of the implementationHas the method for creating a document

Page 40: XML Processing

40

Other Node Subinterfaces

DOMExceptionSomething went wrong

NodeListA list of nodes which has an iterator

NamedNodeMapA map structure holding a collection of nodes

Page 41: XML Processing

41

Common .NET DOM Classes

XmlNode

XmlDocument XmlElement XmlText XmlAttribute

Page 42: XML Processing

42

XmlNodeList

A list of nodes Returned by GetElementsByTagName() Properties

Count -- number of nodes in the list Indexer -- retrieves a node

Methods Item(int n) -- retrieves a node

Page 43: XML Processing

43

XmlNamedNodeMap

A map of nodes indexed by name Superclass of XmlAttributeCollection Returned by the Attributes property Properties

Count

Methods Item(int n) GetNamedItem(string name)

Page 44: XML Processing

44

Examples

* see NodeLister * see DocBuilder