27
Introduction to XPATH Adapted from ” XML How To Program”by Deitel ” XML How To Program”by Deitel Chapter 5 –XML Path Language(XPath) Readings: XML Path Language (XPath) http://www.w3.org/TR/xpath

Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Introduction to XPATH

Adapted from” XML How To Program”by Deitel ” XML How To Program”by Deitel

Chapter 5 –XML Path Language(XPath)

– Readings:� XML Path Language (XPath)

http://www.w3.org/TR/xpath

Page 2: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Introduction

� XPath is a language for specifying navigation within an XML document

� It also provides basic facilities for manipulating strings, � It also provides basic facilities for manipulating strings, numbers, and booleans

� XPath models an XML document as a tree of nodes� Most common nodes are: d, e, a, and t-nodes� XPath defines a way to compute a string value for

each type of a node� Used by other XML technologies

– e.g.XSLT, Xpointer,Xquery

Page 3: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Nodes

� XML document– XML documents are treated as trees of nodes– The root of the tree is called the document node (or root node).– Each node represents part of XML document

� Seven types– Root (document node)– Element– Attribute– Text– Comment– Processing instruction– Namespace

� Attributes or namespaces are not children of their parent node– They describe their parent node

Page 4: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Partial Faculty.xml

d101 faculty.xml

e101faculty

a

name

student student

e101a101

e102 e110 e118

Computingcourse course course

a102

t101

e103 e104 e107

e105a103

c2313

cid

year

2009 ADA

name

e106

t102

sid

x000123

grade

t103

A

e108 e109

t104

sid grade

t105

x0008787 B+

Page 5: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Element node continued

e110course

a

cid

e110

student studenta104

t106

e111 e112 e115

e113a105

c2314

year

2009 CIS

name

e114

t107

sid

x000123

grade

t108

A

e116 e117

t109

sid

x0008787

grade

t110

C

Page 6: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Element node continued

e118coursecid

e118

lecturer student

e118course

a106

t111

e119 e120 e121

a107

c2313

cid

year

2008 IMD

name

t112

Jones

e122 e123

t113

sid

x0008787

grade

t114

C+

Page 7: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

XPath Axis

� Within an XPath step, Axis specifies “direction ” in which to navigate through a document

– For example, the step:child::studentchild::studentwhere Axis = child:: and Node-test = student would select all child nodes (of a context node) that have the name student

� The XPath supports 12 different axis for navigation� The child:: axis is the most commonly used � Some of the others are:

– attribute:: (access attributes of a context node),– descendant:: (access descendant nodes of a context node),– self:: (access the context node itself),– descendant-or-self:: (access the context node and its

descendants, and returns the contents of the nodes that satisfy the node test)

– parent:: (access the parent node of a context node),

Page 8: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

XPath axes.Axis Name Ordering Description

self:: none The context node itself. See Note

parent:: reverse The context node’s parent, if one exists. See Note

child:: forward The context node’s children, if they exist. The default if no axis is provided See Note

ancestor:: reverse The context node’s ancestors parents, grandparents etc, if they exist. ancestor:: reverse The context node’s ancestors parents, grandparents etc, if they exist.

ancestor-or-self:: reverse The context node’s ancestors and also itself.

descendant:: forward The context node’s descendants i.e children grandchildren etc.

descendant-or-self:: forward The context node’s descendants and also itself.

following:: forward Selects everything in the document after the closing tag of the current node.

following-sibling:: forward The sibling nodes following the context node.

preceding:: reverse Selects everything in the document that is before the start tag of the current node

preceding-sibling:: reverse The sibling nodes preceding the context node.

attribute:: forward The attribute nodes of the context node. See Note

namespace:: forward The namespace nodes of the context node.

NOTE: These AXES can be used in an abbreviated form

Page 9: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Some location -path abbreviations.

Location Path Description

child:: This location path is used by default if no axis is supplied and may therefore be omitted. be omitted.

attribute:: The attribute axis may be abbreviated as @.

/descendant-or-self::node()/ This location path is abbreviated as two slashes (//).

self::node() The context node is abbreviated with a period (.).

parent::node() The context node’s parent is abbreviated with two periods (..).

Page 10: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Node-set operators.

Node-set Operators Description

pipe (|)

Performs the union of two node-sets.

slash (/) Separates location steps.

double-slash (//) Abbreviation for the location path /descendant-or-self::node()/

div Division

!= , <=, <, =, >=, > are also supported

and AND

or OR

Page 11: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Some Node -set and String functions

Functions Description

last Returns the number of nodes in the node-set.

position Returns the position number of the current node in the node-set being tested. the node-set being tested.

count Returns the number of nodes in node-set.

name Returns a string containing a the name of the node in the node-set argument that is first in document order.

string-length Returns the number of characters in the string.

starts-with Returns true if the first argument string starts with the second argument string; otherwise returns false.

contains Returns true if the first argument string contains the second argument string; otherwise returns false.

See http://www.w3.org/TR/xpathfor more functions

Page 12: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

The Node -test of an XPath Step

� A Node-test specifies a simple test on XML nodes found along the steps’ axisnodes found along the steps’ axis

� Nodes that pass that test are candidates for the next step

� The node test can be based on the– Node name, or– Node kind

Page 13: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Node-Test Based on Names

� Each axis has a main node kind– the attribute:: axis has attribute – all other axes (child:: , descendant:: , parent:: ) have

element as the main node kindelement as the main node kind

� Only node name tests on nodes of the main node kind can be true

� Suppose course e118 is the context node– descendant::sid returns (e122),– child::* returns (e119, e120, e121),– attribute::year returns (the value 2008 of a107),– attribute::name returns () (an empty sequence of

nodes)

Page 14: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Node-Test Based on the Node Kind

� The most common node-tests that are based on the node kind are:

– node() that selects each node, regardless of the kindnode() that selects each node, regardless of the kind– text() that selects each t-node,– element() that selects each e-node, and– attribute() that selects each a-node- comment() that selects each c-node

� Suppose student node e121 is the context node,then

child::grade/child::text()

returns the sequence (t114) whose string value is C+ (actually, query processor returns only the string C+)

Page 15: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

XPath Location Paths

� Navigation through an XML document is declared using Location Paths expressions

� Location paths can be expressed using either an unabbreviated or an abbreviated syntax

� Location Paths are made up of steps

Page 16: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Evaluation of a Location Path

� A location path is evaluated step by step, from left to right

� A step is applied to a single node, so called � A step is applied to a single node, so called context node

� The application of a step on a context node selects a sequence of result nodes

� Each node of a result sequence is then used as a context node in the following step

� The result of an expression is a concatenation of node-sequences selected by the last step

Page 17: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Unabbreviated Syntax of Location Paths

� A location path has the following syntax:Path ::= Step1/…/Stepn

where each Step is a triple (Axis, Node-test, Predicate) where each Step is a triple (Axis, Node-test, Predicate) and is defined as follows:

Step ::= Axis:: Node-test Predicate*– The axis specifies the direction to move in the document tree– The node test selects nodes along the specified axis, and– The predicates (if any) filter the nodes selected

� Separators “/” between two subsequent steps indicate a direct superior-subordinated relationship between nodes involved in steps

Page 18: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

What Does an XPath Expression Return?

� A location path expression returns a sequence of result nodes with their contents in the form of an XML document

� This XML document does not have to be well formedThis XML document does not have to be well formed� Xpath expression:

/child::course[attribute::cid=“c2313”]/child::student[child::sid=“x0008787"]

� Result: <student> <sid>x0008787</sid><grade>B+</grade>

</student><student>

<sid>x0008787</sid><grade>C+</grade>

</student>

Page 19: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Predicates of a Step

� An XPath step can also include a sequence of predicatesin square brackets

[<predicate>] [<predicate>]

� Predicates are applied to nodes selected by a node-test� Only nodes that evaluate true for all predicates will belong

to the result of a step� A predicate compares a node property with a value using

operators from the set {=, <, >,<, >, !=,}� A node property can be:

– The value of an attribute,– The value of #PCDATA of an element, or– The sibling order value of a node (returned by the function position() )

Page 20: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Examples of XPath Predicates

� Let faculty e101 be the context node– child::course[position()=2] selects the second child

element of the context node that has the name course , and element of the context node that has the name course , and returns e110

– child::course[attribute::cid= “c2313 ”] selects all course children of the context node that have the attribute cid= “c2313 ”, and returns (e102, e118)

– descendant::student[child::sid= “x000123 ”] selects the student descendants of the context node that have a sid child with a string value equal to “s1 ” (e104, e112)

Page 21: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Abbreviated Syntax of Location Path (1)

� The most important abbreviation is that child:: axis can be omitted from a location step

� In fact, child:: is the default axis� For example,

– student/sid is a short for – child::student/child::sid

� There is also an abbreviation for attributes: attribute:: can be abbreviated to @

� For example,– course[@year= “2009 ”] is short for– child::course[attribute::year= “2007 ”] and will

select all course children of the context node whose year is “2009 ” (e102, e110)

Page 22: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Abbreviated Syntax of Location Path (2)

� If a predicate expression evaluates to an integer value that value is considered to be the position of the node selected

For example, step would select the second – For example, student[2] step would select the second student child of the context node

� An empty step ‘// ’ is also a frequently used abbreviation, it specifies that the element that follows may be nested anywhere within the document

– //student would select all student nodes anywhere within the document

– course[@cid= “c2313 ”][@year= “2008 ”]//grade will select all grade elements subordinated to the course element with pid= “p13 ” and year= “2008 ”

Page 23: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Abbreviated Syntax of Location Path (3)

� A location step of “. “is short for self::node() , where self:: refers to the context node and node()

returns nodes of any typereturns nodes of any type� For example:

– .//student is short for– self::node()/descendant-or-self::node()/child::student

and will select all student elements that are children of the context node itself or of any of its descendants

� A location step of .. is short for parent::node()– For example,

� ../lecturer is short for � parent::node()/child::lecturer and will select all lecturer

children of the parent of the context node

Page 24: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

1 <?xml version = "1.0"?>

2

3 <!--: books.xml -->

4 <books>

5 <!-- XML book list -->

6 <book>

7 <title>Java How to Program</title>

8 <translation edition = "1">Spanish</translation>

9 <translation edition = "1">Chinese</translation>

10 <translation edition = "1">Japanese</translation>

11 <translation edition = "2">French</translation>

12 <translation edition = "2">Japanese</translation><price> 75</price>13 <price>75</price>

14 </book>

1515

16

17 <book>

18 <title>C++ How to Program</title>

19 <translation edition = "1">Korean</translation>

20 <translation edition = "2">French</translation>

21 <translation edition = "2">Spanish</translation>

22 <translation edition = "3">Italian</translation>

23 <translation edition = "3">Japanese</translation>

24 <price>65</price>

25 </book>

26 </books>

Page 25: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Predicate Exercises for book.xml

� Examine the XPATH expressions and– In your own words explain what will be returned– Execute them. Did you get it right?1. /books/book[2]

2. /child::books/child::book[position()=2]

3. /books/book[price>70]

4. /books/book[price>70]/title/text()

5. /books/book[last()]

6. /books/book/translation[@edition="1" and text() ="Chinese"]/preceding-sibling::title/text()

Page 26: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Write some XPATH Expressions

� Which books have Japanese translations?– Hint– Use predicate– Use predicate

� Boolean expression for filtering nodes from the search� Compare string value of current node to string ‘Japanese’

� Find the textbook name that has a 3rd edition and a Italian translation

� What translations of the C++ How To Program text book are on the first and second editions?

Page 27: Introduction to XPATH - dingostew.comdingostew.com/download/eoinzy/college/ADA/XPATH_notes.pdf · Introduction XPath is a language for specifying navigation within an XML document

Summary

� XPath is a language for specifying navigation through an XML document

� XPath models an XML document as a tree of nodesA location path has the following syntax:� A location path has the following syntax:

Path ::= Step1/…/Stepn

where each Step is a triple (Axis, Node-test, Predicate): – The axis specifies the direction to move in the document tree– The node test selects nodes along the specified axis, and– The predicates (if any) filter the nodes selected

� A location path can be either:Relative, or Absolute� A relative location path is declared with regard to a context node

and its evaluation starts from this node