23
DSA/2006/week 14 1 XML And XPath DSA Term 2 Week 14

XML And XPath

  • Upload
    ailsa

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

XML And XPath. DSA Term 2 Week 14. Lecture overview. Matters arising Character coding Well-formed XML Creating simple XML files Placename to BBC code Introduction to XPath. Character Coding. Character set ISO 8549 - 1 Byte 0 - 127 are ASCII - PowerPoint PPT Presentation

Citation preview

Page 1: XML And XPath

DSA/2006/week 14 1

XML And XPath

DSA Term 2

Week 14

Page 2: XML And XPath

DSA/2006/week 14 2

Lecture overview

• Matters arising– Character coding

• Well-formed XML

• Creating simple XML files• Placename to BBC code

• Introduction to XPath

Page 3: XML And XPath

DSA/2006/week 14 3

Character Coding• Character set

– ISO 8549 - 1 Byte• 0 - 127 are ASCII• 128- 255 vary depending on the part of the standard • 15 different character maps

– ISO-8859-1 - Latin -1 - the default for HTML– ISO-8859-2 – Central European

• A document must be on one encoding – problem of mixing characters e.g. an Arabic quotation in a Cyrillic text

– UTF-8 - Unicode 1- 4 byte variable length to support a huge range of international languages in a single code

• ASCII is included as characters 0-127• Ensures that the internet is truly multi-lingual• Key invention by Ken Thompson of self-synchronisation allowing character

boundaries to be detected• Character references in HTML

– Named °– decimal &176;– Hexadecimal &#B0;

Page 4: XML And XPath

DSA/2006/week 14 4

Defining the Encoding• Encodings in HTML

– In a meta-tag• <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">

– In the xml processing instruction• <?xml version="1.0" encoding="ISO-8859-1"?>

– In the HTTP content header• Content-Type: text/html; charset=ISO-8859-1

• Setting Encoding in PHP– header("Content-type: text/html; charset=UTF-8");

• Setting encoding in the Browser– Firefox

• View/Character Encoding

Page 5: XML And XPath

DSA/2006/week 14 5

Design a simple XML file

• Design an XML vocabulary to represent pairs of place names and codes

– Bristol 1263– Bath 1123

• First review XML structure

Page 6: XML And XPath

DSA/2006/week 14 6

Example<MapSet> <Map id="P2" desc="P Block level 2"> <room id="2P2"> <area shape="rect" coords="118,39,138,68"/> <type>Staff Room</type> <occupant>Tony Solomonides</occupant> </room> <room id="2P3"> <area shape="rect" coords="141,40,162,69"/> <type>Staff Room</type> <occupant>Richard Lawson</occupant> </room> <room id="2P4"> <area shape="poly" coords="201,40,234,40,234,118,164,119,163,71,200,71"/> <type>Office</type> <occupant>Eleanor Gibbons</occupant> <occupant>Dee Evans</occupant> <occupant>Ali Jack</occupant> </room> …. </Map></MapSet>

Page 7: XML And XPath

DSA/2006/week 14 7

Well-formed XML documents (1)Every XML document must be well-formed and must therefore adhere to the following rules (among others):

1. Every start-tag must have a matching end tag.

2. Elements may nest but must not overlap. <name>Anna<em>Coffey</em></name> - √ <name><em>Anna</name>Coffey</em> - ×

3. There must be exactly one root element.

4. Attribute values must be quoted.

5. An element must not be quoted.

6. Comments and processing instructions may not appear inside tags.

7. No unescaped < or & signs may occur in the character data of an element.

Page 8: XML And XPath

DSA/2006/week 14 8

Well-formed XML documents (2)Element names are case sensitive - <NAME>, <name>, <Name> & <NaMe> are four different element types.

No white spaces in element name - <First Name> not allowed; <First_Name> OK.

Element names cannot start with the letters “XML” or “xml” – reserved terms.

Element names must start with a letter or a underscore. Element names cannot start with a number but numbers may be embedded within an element name - <2you> not allowed; <me2you> is OK.

Attribute names are constrained by the above rules for element names.

Entity references are used to substitute specific characters. There are five predefined entities built into XML:

Entity Char Notes

&amp; & Do not use inside processing instructions

&lt; < Use inside attribute values quoted with “.

&gt; > Use after ]] in normal text and inside processing instruction.

&quot; “ Use inside attribute values quoted with “.

&apos; ‘ Use inside attribute values quoted with ‘. Map

Page 9: XML And XPath

DSA/2006/week 14 9

Errors

• Look at the listing of the XML file and identify all the places which prevent this XML from being well-formed

Page 10: XML And XPath

DSA/2006/week 14 10

<Map id=P2 desc="P Block level 2'> <room id="2P2"> This is a nice big office <area rect coords="118,39,138,68"> <typo>Staff Room</typo> <occupant>Tony Solomonides</occupant> </Room> <room id="2P3"> <area rect coords="141,40,162,69"></area> <typo>Staff Room</typo> <occupant>”Richard Lawson”</occupant> </Room> <room id="2P4"> <area poly coords="201,40,234,40,234,118,164,119,163,71,200,71"/> <typo>Office</typo> <occupant>Eleanor Gibbons</occupant> <person>Dee Evans</person> <occupant>Ali Jack</occupant </Room> ---

Page 11: XML And XPath

DSA/2006/week 14 11

Task

• Draw the structure – Use ER notation

• Attributes in the Entity• Cross-foot notation for one-many, optional• Identify any restricted sets of values (ennumerated

types)

– In the lab, QSEE will allow you to define the structure and generate the schema definition (XML Schema or DTD)

Page 12: XML And XPath

DSA/2006/week 14 12

Page 13: XML And XPath

DSA/2006/week 14 13

XPATH• Core language for selecting nodes in XML• Version 1.0 used in XSLT 1.0

– client-side in Browsers – xalan engine– w3.schools Tutorial is for XPath 1.0– SimpleXML in PHP

• Version 2.0 used in XSLT 2.0 – Saxon parser– XQuery 1.0

• Differences– Code data structure in 2.0 is a node sequence– Full support for all XML schema datatypes– Two kinds of equality operators– Larger function library

Page 14: XML And XPath

DSA/2006/week 14 14

XPath Language

• Not a programming language• Expressions to be evaluated• Focus on

– Navigation in a tree structure• Multiple directions or ‘axes’

– Down to children (child axis)– Up to parent (parent axis)– Down to attributes (attribute axis)– Across to siblings (sibling axis)

– Operators – Functions

Page 15: XML And XPath

DSA/2006/week 14 15

  XPATH DOM

Declarative Procedural (navigational)

Root /  document

Context Node Current Directory

(self)

.

Local Name n

Child *[1][last()]

n.childNodesn.firstChildn.lastChild

Parent .. f.parentNode

Sibling ../g../*

f.nextSiblingf.previousSibling

Attribute f/@att f.getAttribute(att) 

Locate node by ID //[@id='f1'] document.getElementById('f1') 

Locate nodes by Tag

//img document.getElementsByTagName('img')

Number of nodes in sequence

count(s) s.length

Select by predicate [@size= 10 and . ='fred']

Page 16: XML And XPath

DSA/2006/week 14 16

XPath operators

• Arithmetic operators+ - * div idiv mod

• Value comparisons eq, le, ge, gt, lt

• Sequence comparisons = , !== is true if there are common elements!= is true if there are no common elements

(1,2,3) = (2,3,4) is true (1,2,3) != (2,3,4) is also true not ((1,2,3) = (2,3,4) ) is false

• Logical operatorsand, or, not()

Page 17: XML And XPath

DSA/2006/week 14 17

large function library

• count (seq) , max((seq)) ,min((seq)), average– count(1,2,3) = 3

• max, min

• string functions– string-length(‘abc’)– tokenize(‘a,b,c’,’,’)– string-join((a,b,c),’, ‘)

Page 18: XML And XPath

DSA/2006/week 14 18

Using the eXist database

• eXist database as an XPath / XQuery engine.– Rest interface

• ..exist/rest/db/chriswallace/rooms?_query=//Map

– Java client – Sandbox (using Ajax to do dynamic syntax checking)

• Context is the whole database

• The demo database includes – the whole text for Romeo and Juliet– the mondial world database

Page 19: XML And XPath

DSA/2006/week 14 19

Examples

• all Rooms– /MapSet/Map/room– //room

• room 2P5– //room[@id=‘2P5’]

• the occupants of room 2P4– //room[@id=‘2P4’]/occupant

• the roomNo of the room which Colin Fudge occupies– //room[occupant = ‘Colin Fudge’]/@id

• the number of occupants of 2P4– count(//room[@id=‘2P4’]/occupant)

• The floor of Ali Jack’s room– //room[occupant = ‘Ali Jack’]/../@desc

Page 20: XML And XPath

DSA/2006/week 14 20

Notes

• Note how = tests if a person is amongst the occupants

• To ‘serialise’ an attribute use string()

• See how ../ allows navigation to the parent element

Page 21: XML And XPath

DSA/2006/week 14 21

Examples for you

• The room number for Richard Lawton

• The coordinates of room 2P2

• All rooms with poly shape

• Who are Ali Jack’s office mates?

Page 22: XML And XPath

DSA/2006/week 14 22

XML design

• Rooms is a mixture of text elements and attributes.

• Could be all attributes – what would change?• Could be no attributes – what would change?• For the workshop exercise use elements instead

of attributes – its simpler even if more verbose• Generally, what do the experts recommend?

Page 23: XML And XPath

DSA/2006/week 14 23

Workshop

• Create a simple XML file containing pairs of Place names and BBC codes

• Change the PHP script to accept a placename

• Read the new xml file and decode the name to get the code using PHP SimpleXML interface and xpath(‘’)