59
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 3 XPath (Based on Møller and Schwartzbach, 2006, Chapter 3) David Meredith [email protected] ww.titanmusic.com/teaching/cis336-2006-7.htm

CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

  • Upload
    wes

  • View
    41

  • Download
    1

Embed Size (px)

DESCRIPTION

CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226). David Meredith [email protected] www.titanmusic.com/teaching/cis336-2006-7.html. Lecture 3 XPath (Based on Møller and Schwartzbach, 2006, Chapter 3). What is XPath?. - PowerPoint PPT Presentation

Citation preview

Page 1: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

1

CIS336Website design, implementation and management(also Semester 2 of CIS219, CIS221 and IT226)

Lecture 3XPath

(Based on Møller and Schwartzbach, 2006, Chapter 3)

David [email protected]

www.titanmusic.com/teaching/cis336-2006-7.html

Page 2: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

2

What is XPath?• XPath is a language for selecting parts of and navigating

around an XML tree• Used in

– XML Schema for uniqueness and scope descriptions– XSLT for pattern matching and selection– XQuery for selection and iteration– XLink and XPointer

• XPath can also be used to do computations on data values• Example of an XPath expression:

//rcp:ingredient[@amount='0.5' and @unit='cup']/@name

– This expression uses abbreviations (// and @) and tacit conventions

• XPath 1.0 was a relatively simple language• Through interaction with XQuery, it has become a

much larger language called XPath 2.0• XPath 2.0 specification is a W3C proposed

recommendation (November 2006), available here: http://www.w3.org/TR/xpath20/

Page 3: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

3

Location steps and paths

• An XPath location path evaluates to a sequence of nodes in an XML tree– Sequence never contains duplicates of identical nodes

• However it is possible for two or more nodes to contain the same values and therefore be “equal”

• Location path is sequence of location steps, separated with / character, e.g., child::rcp:recipe[attribute::id='117']/child::rcp:ingredient/attribute::amount– This expression selects…

• … all the amount attributes…• … in rcp:ingredient nodes that are children of…• … rcp:recipe nodes with an attribute called id with value '117‘ that are…• … children of the context node (which is assumed here to be the root element)

– Expression consists of three location steps, each with the formataxis::nodetest[pred1][pred2]…

where axis is the axis, nodetest is a node test and pred1 and pred2 are predicates which are XPath expressions

• Axis, nodetest and predicates are increasingly specific definitions of the sequence of nodes that the location step selects

Page 4: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

4

Location step maps context onto new sequence of nodes

• Location step always evaluated relative to a context and always evaluates to a sequence of nodes– The context is itself a sequence of nodes– Therefore a location step transforms one sequence of

XML tree nodes, called the context, into another sequence of XML tree nodes

– The output sequence is generated by concatenating the result of replacing each node, x, in the input sequence with the result of evaluating the location step relative to x as the context node

Page 5: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

5

An ExampleA

BB

C

F

C

E

F F

D

E

F E

F

E F

C

Page 6: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

6

An ExampleA

BB

C

F

C

E

F F

D

E

F E

F

E F

C

Context node

Page 7: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

7

An ExampleA

BB

C

F

C

E

F F

D

E

F E

F

E F

C

descendant::C

Page 8: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

8

An ExampleA

BB

C

F

C

E

F F

D

E

F E

F

E F

C

descendant::C/child::E

Page 9: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

9

An ExampleA

BB

C

F

C

E

F F

D

E

F E

F

E F

C

descendant::C/child::E/child::F

Page 10: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

10

Contexts• The context of an XPath evaluation consists of

– a context node (a node in an XML tree)– a context position and size

• If location path isdescendant::C/child::E/child::Fand first two location steps have been evaluated, then we have the sequence (E1, E2, E3) shown at left

• child::F location step then evaluated on E1, E2 and E3 in turn

• In each case, context size is 3 because input sequence for this step contains 3 nodes: (E1, E2, E3)

• When child::F evaluated on E1, context position is 1; when evaluated on E2, context position is 2; and so on.

– a set of variable bindings (mapping variable names to values)

– a function library• XPath specification guarantees that context provides

set of core functions– a set of namespace declarations

• For example, in our examples, we assume that the namespacehttp://www.brics.dk/ixwt/recipesis bound to the namespace prefix rcp

• The application determines the initial context• If the path starts with ‘/’ then

– the initial context node is the root node (not the root element)

– the initial position and size are 1

A

BB

C

F

C

E

F F

D

E

F E

F

E F

C1

2 3

Page 11: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

11

Axes• An axis is a

sequence of nodes evaluated relative to the context node

• First approximation to sequence of nodes we want to obtain as the result of a location step

• XPath supports 12 different axes

• child– Children of context node

• NB: excludes attribute nodes

• descendant– Descendants of context node

• NB: excludes attribute nodes

• parent– Unique parent node or empty sequence if context node is root

node• ancestor

– All ancestors of the context node, from parent to root node• following-sibling

– Right-hand siblings of context node• Empty sequence if context node is an attribute node

• preceding-sibling– Left-hand siblings of context node

• Empty sequence if context node is an attribute node

• following– All nodes appearing later in the document than the context

node• Excludes descendants of context node

• preceding– All nodes appearing before context node in document

• Excludes ancestors of context node

• attribute– Every attribute node whose parent is the context node

• Order is implementation-dependent, but stable (i.e., always same ordering for a given input)

• self– The context node itself

• Actually a sequence containing just the context node

• descendant-or-self– Concatenation of self and descendant axes

• ancestor-or-self– Concatenation of self and ancestor axes

Page 12: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

12

Axis Directions• Each axis has a direction

– Forwards means document order:child, descendant, following-sibling, following, self, descendant-or-self

– Backwards means reverse document order:parent, ancestor, preceding-sibling, preceding, ancestor-or-self

– Stable (i.e., always same output for same input) but depends on the implementation:attribute

• self, ancestor, descendant, preceding and following together form a disjoint partition of all the nodes in an XML tree

Page 13: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

13

The parent Axis

171612117

272615141086

25191395

3 4 21 23 24

22202

1

18 28

(4)

Page 14: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

14

The child Axis

171612117

272615141086

25191395

3 4 21 23 24

22202

1

18 28

(14,15)

Page 15: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

15

The descendant Axis

171612117

272615141086

25191395

3 4 21 23 24

22202

1

18 28

(14,15,16,17,18)

Page 16: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

16

The ancestor Axis

171612117

272615141086

25191395

3 4 21 23 24

22202

1

18 28

(4,2,1)

Page 17: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

17

The following-sibling Axis

171612117

272615141086

25191395

3 4 21 23 24

22202

1

18 28

(19)

Page 18: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

18

The preceding-sibling Axis

171612117

272615141086

25191395

3 4 21 23 24

22202

1

18 28

(9,5)

Page 19: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

19

The following Axis

171612117

272615141086

25191395

3 4 21 23 24

22202

1

18 28

(19,20,21,22,23,24,25,26,27,28)

Page 20: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

20

The preceding Axis

171612117

272615141086

25191395

3 4 21 23 24

22202

1

18 28

(12,11,10,9,8,7,6,5,3)

Page 21: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

21

Node tests• Second part of a location step is the node test:

child::rcp:recipe[attribute::id='117']/child::rcp:ingredient/attribute::amount

• Types of node test:text() selects only the character data nodescomment() selects only the comment nodesprocessing-instruction() selects only the

processing instruction nodesnode() selects all nodes* selects all nodes in the axis preceding the node test

• If axis is not attribute, then only element nodes selected

name selects the nodes with the given QName, name*:localname selects the nodes with given NCName, localname, in any namespace

prefix:* selects all nodes in the same way as *, but only in the specified namespace

Page 22: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

22

Resolving names without namespaces

• In XPath 1.0, missing namespace prefix interpreted as empty URI, "", not default namespace– Bug fixed in XPath 2.0, where empty prefix interpreted as default namespace, not empty URI

• However, most tools implement XPath 1.0• Suppose we want to select the ref attribute of the subwidget element

– Could try:/child::widget/child::big/child::subwidget/attribute::ref

• But this won’t work because each node name in expression is interpreted as being from the empty namespace• In fact, no XPath expression will work here!

– In order to work XPath 1.0 and 2.0, every element name has to be explicitly qualified with namespace, as in lower example and XPath expression must be changed to

/child::wdg:widget/child::wdg:big/child::wdg:subwidget/attribute::ref

Page 23: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

23

Predicates• Final part of location step consists of zero or more predicates:

child::rcp:recipe[attribute::id='117']/child::rcp:ingredient/attribute::amount• Predicate is an XPath expression, evaluated as a boolean condition• XPath expressions are as rich as expressions in a general purpose programming

language like Java– Can produce values of different types

• e.g., numbers, booleans, strings, sequences

• When used as a predicate, the value of an XPath expression is coerced into a boolean value:

– A number is coerced to true when it is equal to the current context position– A string is coerced to true when it has non-zero length– A sequence is coerced to true when it has non-zero length

• Boolean conditions can be combined using the operators and and or and the function not

• Variables from the context can be referenced using the syntax $foo where foo is the variable name

• The usual arithmetic (+,-,*,div) and comparison (=, !=, <, <=, etc.) operators are also available

• Sometimes useful to use location paths as predicates:/descendant::rcp:recipe[descendant::rcp:ingredient[attribute::name=‘sugar’]]selects every recipe node that contains sugar; whereas/descendant::rcp:recipe/descendant::rcp:ingredient[attribute::name=‘sugar’]selects every ingredient node whose name is sugar.

Page 24: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

24

More on predicates• The predicates in a location step are evaluated left-to-right

– i.e., the first predicated is evaluated, producing a sequence of nodes which forms the context for the second predicate, and so on

• This means that changing the order of the predicates can change the value of the result of the expression

• For example:/descendant::rcp:ingredient[position()=3][position()=1]returns every ingredient which is the third ingredient in a recipe; whereas/descendant::rcp:ingredient[position()=1][position()=3] returns the empty sequence because once the first ingredient in each recipe has been selected, there is only one element in the context when the second predicated is evaluated (and therefore no third element)

• If we combine two predicates with and, then this also generally gives a different result:/descendant::rcp:ingredient[position()=3 and position()=1]returns the empty sequence because there is no element whose position within the current context is both 3 and 1

Page 25: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

25

Typical location paths• XPath language is large, but you usually

only use a small part of it

• There are a few patterns that are used particularly often

• Most commonly used axes are child, descendant and attribute

• *, test() and QName are most commonly-used node tests

Page 26: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

26

Some examples/descendant::rcp:recipe/child::rcp:title

– Selects every title node in every recipe/descendant::rcp:recipe/descendant::rcp:ingredient/attribute::name

– Selects the name of every ingredient in every recipe/descendant::rcp:*/child::text()

– Selects every character data node in the collection[attribute::amount]

– Selects the nodes in the context that contain an attribute called amount[attribute::amount='0.5']

– Selects nodes in the context that have an attribute whose name is amount and whose value is '0.5'

[attribute::name!='flour']– Selects nodes in the context that do not have a name attribute whose value is

'flour‘[attribute::amount<0.5 and attribute::unit='cup']

– Selects context nodes that have an amount attribute with a value less than 0.5 and a unit attribute whose value is cup

[position()=2]– Selects the second node in the context

[descendant::rcp:ingredient]– Selects those nodes in the context that contain an ingredient node

Page 27: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

27

Using XPath in other languages

• XPath expressions often appear as attribute values in other XML languages (e.g., XML Schema and XSLT)

• When used in other XML languages, all special characters have to be escaped, e.g.<xsl:apply-templates

select="descendant::rcp:ingredient[attribute::amount&lt;0.5]" />Instead of<xsl:apply-templates

select="descendant::rcp:ingredient[attribute::amount<0.5]" />

Page 28: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

28

Abbreviations• XPath allows certain abbreviations that make expressions easier to

write• If no axis is used, default axis of child is used

/child::rcp:collection/child::rcp:recipe/child::rcp:ingredientis equivalent to

/rcp:collection/rcp:recipe/rcp:ingredient

• attribute axis can be replaced with @/rcp:collection/rcp:recipe/rcp:ingredient/attribute::amount

is equivalent to/rcp:collection/rcp:recipe/rcp:ingredient/@amount

• /descendant-or-self::node()/ can be replaced with //, e.g.,//rcp:recipe[rcp:title=‘Ricotta Pie’]//rcp:ingredient

selects all ingredient nodes in any recipe whose title is ‘Ricotta Pie’ no matter how deeply nested within the recipe node the ingredient node might be

• self::node() can be replaced with . and parent::node() can be replaced with .. e.g.,

/descendant-or-self::node()/child::rcp.nutrition[attribute::calories=349]/parent::node()/child::rcp:title/child::text()

can be abbreviated to//rcp:nutrition[@calories=349]/../rcp:title/text()

Page 29: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

29

Some subtleties with abbreviations• Any expression beginning with / is evaluated with the

root node (not the root element) as the context, so//rcp:recipe/rcp:ingredient[//rcp:ingredient]

returns all ingredients that are not inside composite ingredients, whereas

//rcp:recipe/rcp:ingredient[.//rcp:ingredient]returns all composite ingredients

• Note also that//rcp:ingredient[1]

selects all ingredient nodes that are first among their siblings; whereas,

/descendant::rcp:ingredient[1]selects the first ingredient in the collection since predicate selects from single sequence of nodes that satisfies the axis and node test

Page 30: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

30

General expressions

• XPath has grown into a large language for expressing computations on sequences

• XPath 2.0 has many features motivated by its use for selection within XQuery

• Keywords and operators in XPath 2.0:$ , to | union intersect except .+ - * div idiv mod and or= != > >= < <=eq ne lt le gt geis << >> for in if then else some every satisfies

Page 31: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

31

Values and atomization• Every XPath expression evaluates to a sequence of items

– Sequence may be empty– Each element in the sequence can be

• A node• An atomic value which can be

– A number» Integer, decimal, float or double

– A boolean value– A string of Unicode characters– A datatype defined in XML Schema

• Note that the result of an expression is always a sequence even if the sequence only contains one element

– XPath interprets a single atomic value as a singleton sequence containing that value• Remember that a single node contains all its descendants and therefore denotes the subtree of

which it is the root• Atomizing a sequence means converting it into a sequence of atomic values

– This is done by converting every node into its string value thus:• The string value of a text node is its contents• The string value of an element is the concatenation in document order of the string values of all descendant text

nodes• The string value of an attribute node is the value of the attribute• The string value of a comment node is the comment text• The string value of a processing instruction node is the processing instruction value• The string value of a root node is the concatenation in document order of the string values of all descendant text

nodes

• For example, the atomized sequence generated by the XPath location/rcp:collection/rcp:recipe[@id='r101']/rcp:ingredient/@name

which returns the sequence containing the names of the ingredients in the first recipe in the collection is

beef cube steak onion, sliced into thin rings green bell pepper, sliced in rings Italianseasoned bread crumbs grated Parmesan cheese olive oil spaghetti sauce shreddedmozzarella cheese angel hair pasta minced garlic butter

Page 32: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

32

Literal expressions• A literal expression is a singleton sequence containing a

constant atomic value• Literal numbers written in expected way, thus

42 is an integer3.1415 is a decimal6.022E23 is a float or double

• Literal strings enclosed in single or double quotes, thus'XPath is a lot of fun'"XPath is a lot of fun"– When string contains single or double quotes, there are two

solutions:'The cat said "Meow"'"The cat said ""Meow""'

• No literal boolean values– Use constant functions, true() and false()

Page 33: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

33

Comments

• Insert a comment into an XPath expression by using the following syntax:(: this is a comment :)

Page 34: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

34

Variable references

• Variable references are written as follows:$foo

refers to the variable foo

• Variable may be – bound within context of expression– created through a binding in a for expression

or quantified expression

• Variable name may be any QName– Variable name may belong to a specific

namespace:$bar:foo

Page 35: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

35

Arithmetic expressions• For integers, decimals, floats and doubles:

+ - * div-n (unary minus)

• For integers:idiv (integer division)mod (modulo)

• Every argument is actually a sequence– if argument is the empty sequence, then result is an empty sequence– if all arguments are singleton sequences containing numbers of the

expected type, then the operation is performed and the result is returned as a sequence

– otherwise, a runtime error occurs• Variables are QNames and can therefore contain a minus sign (-)

– Thus:$foo-17 is a reference to a variable called foo-17

– If we want to subtract 17 from foo, then we have to write:($foo)-17 $foo -17$foo+-17or anything else that separates the foo from the 17

Page 36: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

36

Sequence expressions• If expi is an expression, then

exp1, exp2, exp3, ..., expnconstructs a new sequence which is the concatenation of all the expressions, expi

• Atomization is always performed before concatenation, so it is impossible to produce nested sequences

– Thus,(1, (2, 3, 4), ((5)), (), (((6, 7), 8), 9))

produces the same sequence as1, 2, 3, 4, 5, 6, 7, 8, 9

• Expression exp1 to exp2

requires that exp1 and exp2 are both singleton sequences that evaluate to integers and whole expression evaluates to the sequence

exp1, exp1+1, ..., exp2– e.g., sequence above could also be expressed as

1 to 9• Node sequences can be combined using set operators:

union (or |)intersectexcept (which means set difference)

– each performs set operation and returns result sequence in document order containing no duplicate of identical nodes

– For example, sequence exp can be sorted into document order with duplicates of identical nodes removed using the expression exp | ()

Page 37: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

37

Path expressions

• Location paths are XPath expressions• A location path is evaluated in a sequence of location

steps, starting with a context• Can also evaluate a location path relative to any arbitrary

node sequence returned by some XPath expression– result is concatenation of results of evaluating location path with

each node in the input sequence in turn– context position of each node in input sequence is position within

input sequence– context size is length of input sequence

• For example(fn:doc("veggie.xml"),

fn:doc("bbq.xml"))//rcp:titlereturns titles of all recipes in both files– fn:doc function returns root node of a document

Page 38: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

38

Filter expressions• Location path predicate is a special type of filter

expression• Filter expression can be applied to any sequence

containing nodes and/or atomic values• Syntax:

exp[filter]where – exp is an expression that evaluates to a sequence– filter is a filter expression that selects those elements in exp for

which filter is true– inside filter, current item in exp is referred to by the symbol .– current context position is position within exp– current context size is size of exp

• Example:(30 to 60)[. mod 5 = 0 and position()>20]

has the same result as the expression50,55,60

(remember that , is the symbol for concatenation in XPath)

Page 39: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

39

Comparison expressions

• There are three types of comparison expressions in XPath:– Value comparisons

• used to compare atomic values

– General comparisons• can be used to compare all values

– Node comparisons• used to compare nodes on identity and document

order

Page 40: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

40

Value comparisons

• Value comparison operators are used to compare atomic values

• Value comparison operators areeq ne lt le gt ge

• When applied to two arbitrary values, following procedure carried out:

1. the two values are atomized2. if either resulting sequence is empty, the result is the empty sequence3. if either sequence has more than one element, the result is a type

error4. if the two atomic values (represented by two singleton sequences) are

not comparable (e.g., 7 and "abc"), a runtime error occurs5. otherwise the result is obtained by comparing the two atomic values

• For example, the following expressions all evaluate to true:8 eq 4 + 4//rcp:description/text() eq "Some recipes used in the XML tutorial."(//rcp:ingredient)[1]/@name eq "beef cube steak"

Page 41: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

41

General comparisons• General comparison operators used to compare all values• General comparison operators are

= != < <= > >=• When applied to two values, following steps performed:

– the two arguments are atomized– if there exists at least one pair of atomic values, one from each

argument, for which the comparison holds, the result is true– otherwise the result is false

• For example, the following all evaluate to true:8 = 4+4(1,2) = (2,4)(2,4) = (3,4)//rcp:ingredient/@name='salt'

• This type of equality is not transitive:(1,2) != (3,4)

even though (1,2) = (2,4) and (2,4) = (3,4)

Page 42: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

42

Node comparisons• Node comparison operators used to compare nodes for identity

and document order• Node comparison operators are:

is arguments refer to the same node (identity)<< first argument precedes second in document order>> first argument follows second in document order

• When applied to two arguments, following steps performed:– if either sequence is empty, returns empty sequence– if both arguments are singleton sequences containing nodes, then the

result is the boolean value of the comparison– otherwise a runtime error occurs

• For example, following all evaluate to true:(//rcp:recipe)[2] is //rcp:recipe[rcp:title/text() eq "Ricotta Pie"]/rcp:collection << (//rcp:recipe)[4](//rcp:recipe)[4] >> (//rcp:recipe[3])

• Note that, in an XSLT or XQuery file, the operators << and >> must be rendered as &lt;&lt; and &gt;&gt;, respectively

Page 43: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

43

Comparison confusions• Must always carefully consider whether a

comparison should be a value, general or node comparison– making the wrong decision could lead to an

unexpected result• For example, given that the 40th and 53rd

ingredients in the recipes.xml file are different amounts of salt:

(//rcp:ingredient)[40]/@name eq (//rcp:ingredient)[53]/@namereturns true but(//rcp:ingredient)[40]/@name is (//rcp:ingredient)[53]/@namereturns false

Page 44: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

44

Boolean expressions• Operators and and or accept arguments of any type

which are then coerced to effective boolean values• Following are coerced to boolean value true:

– boolean value true– a non-empty string– a non-zero number– a sequence in which the first item is a node

• Following coerced to boolean value false:– boolean value false– empty string– 0– empty sequence

• Otherwise, result is undefined or an error• Boolean values true and false can be constructed using

the functions true() and false()• Boolean value can be negated using the function

not(exp)

Page 45: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

45

Functions• XPath 2.0 and XQuery 1.0 functions are defined in the

proposed W3C recommendation which is available here:http://www.w3.org/TR/xpath-functions/

• To use functions, context must contain declaration of the namespacehttp://www.w3.org/2005/xpath-functions/– This URI is also the URL of a page that summarises all the

available XPath 2.0 functions– This namespace is traditionally given the prefix fn in a

namespace declaration

• The XML Schema namespace,http://www.w3.org/2001/XMLSchema

also defines some useful functions for coercion and constructing data values– XML Schema namespace traditionally given prefix xs

Page 46: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

46

Arithmetic Functions

fn:abs(-23.4) = 23.4fn:ceiling(-23.4) = -23fn:floor(23.4) = 23fn:round(23.4) = 23fn:round(-23.4) = -23fn:round(23.5) = 24

Page 47: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

47

Boolean Functions

fn:not(0) = fn:true()fn:not(fn:true()) = fn:false()fn:not("") = fn:true()fn:not((1)) = fn:false()(1) evaluates to the number 1

Page 48: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

48

String Functions

fn:concat("X","ML") = "XML"fn:concat("X","ML"," ","book") = "XML book"fn:string-join(("XML","book")," ") = "XML book"fn:string-join(("1","2","3"),"+") = "1+2+3"fn:substring("XML book",5) = "book"fn:substring("XML book",2,4) = "ML b"fn:string-length("XML book") = 8fn:upper-case("XML book") = "XML BOOK"fn:lower-case("XML book") = "xml book"

Page 49: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

49

Regexp Functions

fn:contains("XML book","XML") = fn:true()fn:matches("XML book","XM..[a-z]*") = fn:true()fn:matches("XML book",".*Z.*") = fn:false()fn:replace("XML book","XML","Web") = "Web book"fn:replace("XML book","[a-z]","8") = "XML 8888"

Page 50: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

50

Cardinality Functions

Following decide cardinality of general sequences

fn:empty is the negation of fn:existsit returns false iff the argument is the empty sequence

fn:exists(()) = fn:false()fn:exists((1,2,3,4)) = fn:true()fn:empty(()) = fn:true()fn:empty((1,2,3,4)) = fn:false()fn:count((1,2,3,4)) = 4fn:count(//rcp:recipe) = 5

Page 51: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

51

Sequence Functions

Argument sequence must be enclosed in parentheses, otherwise function treats each element as a separate argument

makes sense, since comma used to separate arguments!

fn:distinct-values((1, 2, 3, 4, 3, 2)) = (1, 2, 3, 4)removes duplicates using eq for atoms and is for nodesorder of result is implementation-dependent

fn:insert-before((2, 4, 6, 8), 2, (3, 5)) = (2, 3, 5, 4, 6, 8)fn:remove((2, 4, 6, 8, 10), 3) = (2, 4, 8, 10)fn:reverse((2, 4, 6, 8)) = (8, 6, 4, 2)fn:subsequence((2, 4, 6, 8, 10), 2) = (4, 6, 8, 10)fn:subsequence((2, 4, 6, 8, 10), 2, 3) = (4, 6, 8)

Page 52: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

52

Aggregate Functions

fn:avg((2, 3, 4, 5, 6, 7)) = 4.5fn:max((2, 3, 4, 5, 6, 7)) = 7fn:min((2, 3, 4, 5, 6, 7)) = 2fn:sum((2, 3, 4.5, 5, 6, 7)) = 27.5

Page 53: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

53

Node Functions

fn:doc("http://www.brics.dk/ixwt/recipes/recipes.xml")reads in document and returns root node

fn:position()returns current context position

fn:last()returns current context size

Examples:fn:doc("recipes.xml")//rcp:recipe[fn:position()=2]/rcp:title/text()returns

Ricotta Pie

fn:doc("recipes.xml")//rcp:recipe[fn:last()]/rcp:title/text()returns

Cailles en Sarcophages

fn:doc("http://www.brics.dk/ixwt/recipes/recipes.xml") isfn:doc("http://www.brics.dk/ixwt/recipes/recipes.xml")

returns trueif same URI read twice, second call to doc returns initially constructed root node for that URI

Page 54: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

54

Coercion Functionsxs:integer("5") = 5xs:integer(7.0) = 7xs:decimal(5) = 5.0xs:decimal("4.3") = 4.3xs:decimal("4") = 4.0xs:double(2) = 2.0E0xs:double(14.3) = 1.43E1xs:boolean(0) = fn:false()xs:boolean("0") = fn:true()xs:boolean("false") = fn:true()xs:string(17) = "17"xs:string(1.43E1) = "14.3"xs:string(fn:true()) = "true"

Page 55: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

55

For Expressions• The expression

for $r in //rcp:recipe return fn:count($r//rcp:ingredient[fn:not(rcp:ingredient)])

returns the number of simple ingredients in each recipe11, 12, 15, 8, 30

• The expressionfor $i in (1 to 5) for $j in (1 to $i) return $j

returns the value1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5

Page 56: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

56

Conditional Expressionsif (exp1) then exp2 else exp3

Following returns average amount in millilitres of all ingredients measured in cups, teaspoons or tablespoons

fn:avg( for $i in //rcp:ingredient return if ( $i/@unit = "cup" ) then xs:double($i/@amount) * 237 else if ( $i/@unit = "teaspoon" ) then xs:double($i/@amount) * 5 else if ( $i/@unit = "tablespoon" ) then xs:double($i/@amount) * 15 else ())

If ingredient does not contain a unit attribute, then @unit = () and ()="cup" is equal to false

Page 57: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

57

Quantified Expressionssome $r in //rcp:ingredient satisfies $r/@name eq "sugar"

returns true if sugar is an ingredient in any of the recipes

Above expression is equivalent to:

fn:exists( for $r in //rcp:ingredient return if ($r/@name eq "sugar") then fn:true() else ())

Page 58: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

58

XPath 1.0 Restrictions

• Many implementations only support XPath 1.0– Incorrect handling of default namespaces– Smaller function library– Implicit casts of values

• Some expressions change semantics:"4" < "4.0"

is false in XPath 1.0 (because it implicitly casts the values to numbers) but true in XPath 2.0 (which treats the arguments as strings and compares them in terms of lexicographic order)

Page 59: CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226)

59

Summary• Being able select and navigate to nodes in an XML tree is essential

if we are to perform computations on the data• This functionality has been factored out into the XPath language• XPath location path consists of a sequence of location steps, each

of which consists of an axis, node step and zero or more predicates• Location paths can be abbreviated using various conventions and

special symbols• A location step maps a sequence of nodes onto another sequence

of nodes• XPath expressions are used in XSLT and XQuery• XPath expressions evaluate to sequences containing nodes and

atomic values• A large collection of functions have been defined for use in XPath

expressions which allow for it to be used for carrying out complex computations on sequences