82
Introduction to XPath Transparency No. 1 Introduction to XML Path Language (XPath20) Cheng-Chia Chen

Introduction to XML Path Language (XPath20)

Embed Size (px)

DESCRIPTION

Introduction to XML Path Language (XPath20). Cheng-Chia Chen. What is XPath ?. Latest version: 2.0 : http://www.w3.org/TR/xpath20 XQuery/XPath Data Model (XDM) XQuery/XPath Formal Semantics XQuery 1.0 and XPath 2.0 Functions and Operators 1. 0 : http://www.w3.org/TR/xpath - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 1

Introduction to XML Path Language (XPath20)

Cheng-Chia Chen

Page 2: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 2

What is XPath ?

Latest version: 2.0 : http://www.w3.org/TR/xpath20 XQuery/XPath Data Model (XDM) XQuery/XPath Formal Semantics XQuery 1.0 and XPath 2.0 Functions and Operators

1.0 : http://www.w3.org/TR/xpath

a language for addressing parts of an XML document,

designed to be used by XSLT , XQuery, XML Schema and XPointer.

References: xfront, W3Schools

Page 3: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 3

TOC

1 Introduction

2 Data Model

3 Location Paths

4 Expressions

5 Core Function Library

Page 4: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 4

1. Introduction

What is XPath? A language used to to address parts of an XML document, provides basic facilities for manipulation of strings,

numbers and booleans, operate on the abstract, logical structure of an XML

document, rather than its surface syntax.

Page 5: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 5

XPath(2.0) data model

provides a tree representation of XML documents as well as atomic values such as number, strings, and booleans,

and flat sequences that may contain both references to nodes

in an XML document and atomic values.

The result of evaluating an XPath expression is a sequence of items, each of which is either a node from the input document, or an atomic value.

Page 6: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 6

Type systems of XPath

XPath Expression: the primary syntactic construct in XPath. would be evaluated to yield a value, which is a possibly

empty sequence of items.

An item is either a node or an atomic value.

Page 7: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 7

Expression evaluation (xpath 1.0)

occurs with respect to a context. XSLT, XQuery and XPointer specify how the context

is determined. A context consists of:

1. a node (the context node) 2. a pair of non-zero positive integers ( the context

position and the context size) 3. a set of variable bindings 4. a function library 5. the set of namespace declarations in scope for the

expression Notes:

3,4,5 does not change when evaluating subexpressions. 2 can only be changed by predicates Some expression may change 1.

Page 8: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 8

Expression evaluation (xpath 2.0)

Expression Context consisting of all information that can affect the result of

evaluating an expression Context are organized into two categories :

static context : contains information available prior to execution

dynamic context : contains information used during execution = static context + additional information

Page 9: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 9

Static context

A static context consists of:1. XPath 1.0 compatibility mode : boolean 2. Statically known namespaces (i.e.,(prefix, uri) pairs )3. Default element/type namespace (or none)

<e1 .../>, <pre:e2 xsi:type="aType" />

4. Default function namespace (or none) max(...), fn:f1(...), ...

5. In-scope schema definitions:1. schema type definitions(local+global) + 2. element declarations (global + local + substitution groups) +3. attribute declarations (global+local) Identified by expanded QName (global) , or implementation dependent

identifiers(local or anonymous).

6. In-scope variables. : a set of (EQName, type) pairs. is the set of variables available for reference within an expression. some constructs (for,some,every ) may extend in-scope variables of

its subexpressions.

Page 10: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 10

7. Context item static type : the static type of the context item

8. Function signatures(i.e., callable functions and constructors ) is the set of functions that are callable from within an expression. Each function identified by its expanded QName and its arity. Function signature also specifies the static types of the function parameters

and result.

9. Statically known collations. is a set of (uri, collation) pairs. A collation is a specification of the manner in

which character strings are compared and ordered. Collations are identified by a uri string.

10. Default collation : is one of statically known collations.

11. Base URI : is the uri for resolution (relative absolute).

12. Statically known documents : pairs of (s : absolute doc uri, t: type) , where t is the type of fn:doc( s) and the

default value of t is document-node()? .

13. Statically known collections : pairs of (s: uri, t:type), where t is the type of fn:collection(s).

14. Statically known default collection type : default type ( is node()* if not given) of fn:collection().

Page 11: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 11

Dynamic context

= static context + additional items listed below :

1. Focus = context {item, position, size} ., position(), last()

2. Variable values : pairs of (EQName, value), where value also contains dynamic type info.

3. Function implementations contains implementation of function signatures given in static

context.

4. Current dateTime : current-dateTime(), current-date(), current-time()

5. Implicit timezone: implicit-timezone()

6. Available documents: Map<uri, document-node>

7. Available collections : Map<uri, node()*>

8. Default collection: value of collection()

Page 12: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 12

Location path

The most important kind of expressions used to selects a set of nodes relative to a context

node.

Page 13: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 13

2. Data Model

details in XQuery/XPath data Model XPath operates on an XML document as a tree of nodes.

All xpath expressions are evaluated to produce a value. In Xpath 2.0, a value is always a sequence. A sequence is an ordered collection of zero or more items. An item is either

an atomic value or a node.

An atomic value is a value (in the value space) of an atomic type, as defined in [XML Schema]. 123 xs:integer; 123.0 xs:decimal; 1.23e2 xs:double xs:date("2011-12-10") xs:QName('xs:date')

Page 14: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 14

Xpath 2.0 data model

A node is an instance of one of the seven node kinds defined in XQuery/XPath data Model .

Each node has a unique node identity, a typed value, and a string value. Some nodes have a name, which is a value of type xs:QName. The typed value of a node is a sequence of zero or more atomic

values. The string value of a node is a value of type xs:string.

In certain situations a value is said to be undefined (for example, the value of the context item, or the typed value of an element node). This term indicates that the property in question has no value and

that any attempt to use its value results in an error.

Page 15: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 15

Kinds of Atoms

Kinds of atoms number1.0 (a double floating-point number) boolean1.0 (true or false) string1.0 (a sequence of unicode characters) or

generalized to including all atomic datatypes defined by xml schema2.0 number2.0 is classified further into integer, decimal, float and double.

Page 16: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 16

Atomization

A sequence of items can be atomized to produce a sequence of atoms by replacing every node item with its typed value as follows: root, text node string value +xs:untypedAtomic comment node, processing-instruction node, namespace node

string value +xs:string attribute value in the typeAnnotation, or string for type:xs:untypedAtomic ex: "12.3e2" in xs:dobule => 12.3 e2; "s1 s2 s3" in xs:IDREFS => sequence ('s1' ,'s2', 's3') of type xs:IDREF*

element of simple content anySimpleType string value + xs:untypedAtomic o/w value(s) + type // ex: list type

element nodes xs:untyped or complex type with mixed content string value +

xs:untypedAtomic complex type + empty content (or nilled ='true' ) () complex type + complex element only content undefined

The typed value of a sequence s can be queried by invoking fn:data(s).

Page 17: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 17

Types of nodes in an XML tree

All but namespace node are the same as in XPath 1.0

The tree contains nodes. Types of nodes and their possible children:

root nodes : element ( = 1), comment, PI element nodes: element, text, PI, comment,

[attribute, namespace] text nodes: leaves attribute nodes : leaves namespace nodes:leaves// xpath2.0 need not support // xquery1.0 do not support

processing instruction nodes : leaves comment nodes : leaves

Page 18: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 18

Basic concepts

See Concepts from XDM

Node Identities Document Order Sequence Types

Page 19: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 19

Node Identity

Every node has a unique identity. (like objects in Java) identical to itself, not identical to any other node. I.e., node1 is node2 iff node1 and node 2 correspond to

the same node occurrence.

Notes: 1.node identity ≠ ID attribute.

2.An element has an identity even if it has no ID attributes.

3.Non-element Nodes also have unique identity.

Atomic values do not have identity; every occurrence of “5” as an integer is identical to every

other occurrence of “5” as an integer.

Page 20: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 20

Example

<courses>

<course name =“dismath”>

<student idref=“Wang” />

<student idref=“chen” /> …

</course>

<course name=“compiler”>

<student idref=“Wang” />

<student idref=“Chang”/> … </course> </courses>

Ex: xpath: ( /courses/course[name=‘dismath’]/student[1]

is (//student)[3] ) returns false. xapth: ((//students)[1]/@idref is (//students)[3]/@idref )

returns false. (why?)

Page 21: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 21

Document order and reverse document order

Same as in XPath 1.0

Page 22: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 22

Example

<?xml version=“1.0” ?>

<a xmlns:ns1 = “uri1” at1 = “…” at2=“…” >

<a1> data1 </a1>

<a2> data2 </a2>

<a3><b3/><!-- comment 1 --> </a3>

<?pi pidata ?>

</a> Doc order: root < a < ns1 < { at1,at2}

< a1 < ns14a1 < data1 …

< a3 < ns14a3 < b3 < ns14b3 < comment

< pi

Page 23: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 23

Sequences

Sequence of items is the unique output type of all XPath expressions. A sequence may contain nodes, atomic values, or any mixture of

nodes and atomic values. no distinction between an item and a singleton sequence

containing that item. (‘123’ ) = ‘123’ ; node2 = ( node2 ).

A node does not loose its identity when it is added to a sequence. [i.e., only references to the node are added] A node may occur in multiple places of one or more sequences.

Sequences are flat and never contain other sequences. Appending (d e) to (a b c) will not produce (a b c (d e)) but would flat

it to (a b c d e ) automatically. Notes:

Sequences replace node-sets from XPath 1.0. In XPath 1.0, node-sets do not contain duplicates.

Page 24: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 24

Types in XDM

accept all types defined by XML Schema supports XSLT and XQuery whose type system are based

on XML Schema. includes 19 built-in primitive types, 5 additional types

defined by XDM and user/implementor defined types.

type system defined in XQuery&XPath formal semantics

Every item in the data model has both a value and a type. Examples: nodes node type, 5 xsd:integer ; ‘5’ xsd:string; “Hello World.” xsd:string.

Page 25: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 25

5:xsd:int

Page 26: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 26

XDM Type Hierarchy

from XDM Type Hierarchy.

Page 27: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 27

Representation of Types

Use expanded-QName (EQName) to represent a type.

Definition: An expanded-QName is a set of three values consisting of {prefix} a possibly empty prefix, {namespace name} a possibly empty namespace URI and {local name} a local name. Note: Only URI and local name is used for identity.

Lexical representation of an expanded QName: [pre1:] localName URI determined by context.

A type [with target namespace = n1 and local name = loc1] is represented by a EQName[ whose URI = n1 and local Name = loc1].

Page 28: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 28

General constraints on nodes

All nodes must satisfy the following general constraints: 1. Every node must have a unique identity, distinct from

all other nodes. [unique identity] 2. The children property of a node must not contain two

consecutive Text Nodes. [no adjacent texts ] 3. The children property of a node must not contain any

empty Text Nodes. [no empty text ] 4. The children and attributes properties of a node must

not contain two nodes with the same identity. [no sharing of nodes ]I.e., no sharing of contained nodes (hence a tree but not a dag ).

Page 29: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 29

Predefined Types (link)

xs:untyped denotes the dynamic type of an element node that has not been

validated, or has been validated in skip mode.

xs:untypedAtomic denotes untyped atomic data, such as text that has not been

assigned a more specific type or attribute value that is validated in skip mode

xs:anyAtomicType derived from xs:anySimpleType the root of all atomic types (not including list or union type) the base type of all 23 primitive types.

xs:dayTimeDuration, xs:yearMonthDuration derived from xs:duration form: PddDTddHddMdd:ddd form: PddddYmmM

Page 30: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 30

atomic (Typed) value constructions

signature (format): see XPath constructor functions prefix:TYPE($arg as xs:anyAtomicType?) as prefix:TYPE?

Notes: ? means the input and output is a sequence of zero or

one atomic value. if $arg is empty () then the output is defined to be also

the empty sequence (). possible prefix:TYPE

xs:integer, xs:int, xs:datetime, xs:boolean,… can also be user defined atomic types : bk:ISBN, np:IP

QName of target type

InputType OutputType

Page 31: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 31

List of constructors for built-in types

xs:string($arg as xs:anyAtomicType?) as xs:string? xs:string(“abc”) string “abc”; xs:string(123) “123”

xs:boolean($arg as xs:anyAtomicType?) as xs:boolean? xs:boolean(“abc”) error; xs:boolan(“”) error; xs:boolean(10)

true; xs:boolean() error; xs:boolean(()) () Note: xs:boolean != fn:boolean (effective boolean value)

xs:decimal($arg as xs:anyAtomicType?) as xs:decimal? xs:decimal(“123.456789” ) 123.456789

xs:float($arg as xs:anyAtomicType?) as xs:float? xs:double($arg as xs:anyAtomicType?) as xs:double? Note:

xs:int(“1234567891234”) error xs:integer(“1234567891234) 1234567891234

Page 32: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 32

All others are similar. xs:duration, xs:dateTime, xs:time,xs:date,xs:gYearMonth, xs:gYear,xs:gMonthDay,xs:gDay,xs:gMonth xs:hexBinary,xs:base64Binary xs:anyURI,xs:QName xs:normalizedString, xs:token, xs:language, xs:NMTOKEN, xs:Name, xs:NCName, xs:ID, xs:IDREF, xs:ENTITY, xs:integer, xs:long, xs:int, xs:short, xs:byte xs:nonPositiveInteger,xs:negativeInteger xs:nonNegativeInteger, xs:unsignedLong,xs:unsignedInt,xs:unsignedShort,

xs:unsignedByte, xs:positiveInteger,xs:yearMonthDuration, xs:dayTimeDuration, xs:untypedAtomic,

Page 33: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 33

More Examples

xs:string(“abc”), xs:int(“123”) xs:float(“123.3e10”) xs:date(“2006-11-12”)

xs:gMonthYear(“--11-12:) xs:gMonth(“--11”) xs:gDay(“---12”)

xs:dateTime(“2006-11-12T12:00:00"). fn:dateTime( xs:date("1999-12-31"), xs:time("12:00:00"))

xs:dateTime("1999-12-31T12:00:00"). fn:dateTime( xs:date("1999-12-31"), xs:time("24:00:00"))

returns xs:dateTime("1999-12-31T00:00:00") because "24:00:00" is an alternate lexical form for "00:00:00".

note: 24:00:00 = 00:00:00

Page 34: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 34

String values

Every atomic value has a string representation.

The value can be obtained by the casting operation: Ex: ( xs:int(“123”) + 45 ) cast as xs:string return “168”

Page 35: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 35

Properties of nodes

string value Every node has a string-value, which is part of the node

or computed from the string-value of descendant nodes. can be obtained by string(.)

typed value can be obtained by data(.)

expanded-name1.0 ( in 2.0 it is replaced with EQName) expanded-name = namespce URI + local part The namespace URI is either null or a URI string

[RFC2396]. Two expanded-names are equal if they have the same

local part, and the same namespace URIs

Page 36: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 36

Node relationship

Same as in xpath 1.0

Page 37: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 37

properties/relationship of nodes m(e) is the URI bound to prefix e

node type expanded

name

string-value child parent

1.root; document

---

( no value)

descendent texts

2,5,6 {}

2.element

( e:local)

m(e) + local

null + local

descendant texts

2,3,5,6. 1,2

3.text --- text content {} 2

4.attribute

( e: attr=“…”)

m(e)+attr or

null+ attr

attr value

(normalized)

{} 2

5.comment --- text of content {} 1,2

6.PI null+PITarget PIData {} 1,2

7.namespace

(xmlns:p=“uri”)

null+p

null+””

uri {} 2

Page 38: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 38

3 Location Paths (renamed PathExpr in 2.0)

Same as in xpath 1.0 (except some mirror change) LocationPath

a special kind of expressions, used to locate a sequence of nodes in the document. sorted in document order no duplicates

Page 39: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 39

Kinds of Expressions

3.1 Primary Expressions : string + numeric literls

3.2 Path Expressions

3.3 Sequence Expressions: , to [ … ], |, intersect, -

3.4 Arithmetic Expressions : +, - , *, div, idiv, mod

3.5 Comparison Expressions: is, <, >, =, le, ge, eq, ne…

3.6 Logical Expressions : and, or, not,

3.7 For Expressions : for

3.8 Conditional Expressions : if

3.9 Quantified Expressions : every, some

3.10 Expressions on SequenceTypes

Page 40: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 40

Primary Expressions

Literals string: “abc”, ‘abc’, “He said “”OK”” ”, ‘He said “ok” ’. numerical: 123 xs:integer, 123.4 xs:decimal 124.4e5 xs:double non-literals: xs:int(“125”) = xs:int(125) = 125 cast as xs:int boolean : fn:true(), fn:false()

Variable References : $pre:name, $var-1 Parenthesized Expressions : ( ), ( expr ) Context Item Expression : .

(1 to 100) [. mod 5 eq 0] //book[ fn:count(./author) > 1 ]

Function Calls : pre:fName( arg1, …, argn ) fn:concate(“abc”, “def”)

Page 41: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 41

Literal Expressions

423.14156.022E23’XPath is a lot of fun’

”XPath is a lot of fun”

’The cat said ”Meow!”’

”The cat said ””Meow!”””

”XPath is just so much fun”

Page 42: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 42

Variable References

$foo$bar:foo

$foo-17 refers to the variable ”foo-17” Possible fixes:

($foo)-17, $foo -17, $foo+-17

Page 43: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 43

XPath operators and their precedences

# Operator (All operators are left associated!)

1 , (comma)

3 for, some, every, if

4 orlogical

5 and

6 eq, ne, lt, le, gt, ge, =, !=, <, <=, >, >=, is, <<, >> comparison

7 to

8 +, -arithmetic

9 , div, idiv, mod

10 union, | combine node seq ( node seq only) 11 intersect, except

12 instance of

13 treat

14 castable

15 cast

16 -(unary), +(unary) unary arithmetic

17 ?, *(OccurrenceIndicator), +(OccurrenceIndicator)

18 /, // path step

19 [ ] predicate

Page 44: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 44

Path Expressions

Locations paths are expressions They may be applied to arbitrary sequences

evaluation rule discussed before.

Page 45: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 45

Sequence Expressions

Constructing Sequences : , , to (1,2,3) ,(), (3) (1,2,3,3) 2 to 4 (2,3,4) (10, (1 to 3)) (10,1,2,3) (1,(2,3,4),((5))) (1,2,3,4,5) -- flatten

Filter Expressions : PrimaryExpr [ … ]* (1 to 30) [ . mod 3 = 0 ] [ . mod 5 = 0 ] (15, 30) (10 to 20) [ 5] (14)

Combining Node Sequences (for Node only): assume doc order : A < B < C < D < E union: (A,B,A) | (B,C) | (A,C) = (A,B) union (B,C) (A,B,C) intersect, except : (A,B,C,D )intersect (B,D,A,E) except (B) (A, D).

Page 46: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 46

Filter Expressions

Predicates generalized to arbitrary sequences The expression ’.’ is the context item The expression:

(10 to 40)[. mod 5 = 0 and position)>20]

has the result:

30, 35, 40

Page 47: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 47

Arithmetic Expressions

+, -, *, div, idiv, mod, +, - (unary) -3 div 2 -1.5 (decimal) -3 idiv 2 -1 (integer) -3.4 mod 2 (or -2) -1.4 rule: x = y * ( x idiv y) + (x mod y)

precedence : {+,-} < {*, mod, div,idiv} < {unary +,-}

Operators are generalized to sequences if any argument is empty, the result is empty () + 3 () All argument are singleton sequences of numbers: ( 3) + ( 4) + 5 12 otherwise, a runtime error occurs (1,3) + (2,4) error

Page 48: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 48

Comparison Expressions boolean

Value Comparisons comparison operators : eq, ne, lt, le, gt, ge. used for comparing single values.

General Comparisons (**) operators: =, !=, <, <=, >, >=. are existentially quantified comparisons that may be

applied to operand sequences of any length. The result is true or false if it does not raise an error.

Node Comparisons operators: is, >>, << A is B true if A anb B are the same node A << B = B >> A true if if A preceds B in doc order.

Page 49: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 49

Value Comparison

Comparison operators: eq(=), ne(≠), lt(<), le(<=), gt(>), ge(>=)

Used on atomic values When applied to arbitrary values ( sequence ):

atomize if either argument is empty => () if one has length > 1 => type error if incomparable, a runtime error ; ex:8 < “abc” otherwise, compare the two atomic values 8 eq 4+4(//rcp:ingredient)[1]/@name eq ”beef cube steak”

Page 50: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 50

Node Comparison

Operators: is, <<, >> Used to compare nodes on identity and order is is for node identity; >>, << for node ordering

When applied to arbitrary values: if either argument is empty, the result is empty if both are singleton nodes, the nodes are compared otherwise, a runtime error. Ex: //book[1] is “abc”

Ex: (//student)[2] is //student[@id = ”s9527”] /rcp:collection << (//rcp:recipe)[4] (//rcp:recipe)[4] >> (//rcp:recipe)[3]

Page 51: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 51

General Comparison (use with care!!) Operators: =, !=, <, <=, >, >= Used on general sequences:

atomize if there exists two values, one from each argument, whose value

comparison holds, the result is true –Note: It may raise an error during the value comparison

otherwise, the result is false ;

8 = 4+4 (1,2) = (2,4)//rcp:ingredient/@name = ”salt”

() = () false!! (2) != (“2”) runtime error(2.0), true( in 1.0 mode)(1,2) = (1, “2”) true(1,2) = (“2”, 1) runtime error (true in 1.0mode)

I.e., seq1 gop seq2 means ∃x1∈seq1∃x2∈seq2 (x1 vop x2).

Page 52: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 52

Be Careful About Comparisons

((//rcp:ingredient)[40]/@name,(//rcp:ingredient)[40]/@amount) eq((//rcp:ingredient)[53]/@name, (//rcp:ingredient)[53]/@amount)

false, only singletons and compatible values can be compared

((//rcp:ingredient)[40]/@name, (//rcp:ingredient)[40]/@amount) =((//rcp:ingredient)[53]/@name, (//rcp:ingredient)[53]/@amount

true, since the two names are found to be equal

((//rcp:ingredient)[40]/@name, (//rcp:ingredient)[40]/@amount) is((//rcp:ingredient)[53]/@name, (//rcp:ingredient)[53]/@amount)

runtime error, since only single-node sequences can be compared

Page 53: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 53

Algebraic Axioms for Comparisons

xx

xyyx

zxzyyxzxzyyx

•Reflexivity:

•Symmetry:

•Transitivity:

•Anti-symmetry:

•Negation:

yxxyyx

yxyx

Page 54: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 54

Genral comparisons violates most axioms

Reflexivity?

()=() yields false

Transitivity? (1,2)=(2,3), (2,3)=(3,4), not (1,2)=(3,4)

Anti-symmetry?

(1,4)<=(2,3), (2,3)<=(1,4), not (1,2)=(3,4)

Negation?

(1)!=() yields false, (1)=() yields false

Page 55: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 55

Logical Expressions

Operators: and, or Constants use functions :

true() and false()

Negation uses function: not(…)

prcedence: or < and < not(.) Arguments are coerced, false if the value is:

the boolean : false() the empty sequence : () the empty string : ”” the number zero : 0 e.g: 0 or ”0” true; not(”0”) false ; 0 or () false

Page 56: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 56

Functions

XPath has an extensive function library Default namespace for functions:http://wwww.w3.org/2005/xpath-functions

http://www.w3.org/2006/xpath-functions 106 functions are required

More functions with the namespace:

http://www.w3.org/2001/XMLSchema for constructors

Page 57: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 57

Function Invocation

Calling a function with 4 arguments:

fn:avg(1,2,3,4) -- fail

Calling a function with 1 argument:

fn:avg((1,2,3,4))

Page 58: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 58

Numeric operators and functions

Arithmetic operators:+, -, *, div, idiv, mod

ex: 2 + 3, + 3, 5.0 – 4, -+4.0, 30.2 div 4.2, 30 idiv 4, 20 mod 3

Value comparisons: eq(=), ne(!=), le(<=), lt(<), ge(>), gt(>=) 2.3 > 5; 4 != 3; 4 ge 6

Functions:fn:abs(-23.4) = 23.4fn:ceiling(23.4) = 24fn:floor(23.4) = 23 //round-half-to-largest fn:round(23.4) = 23 ; fn:round(-23.5) = -23fn:round-half-to-even(-23.5) = -24

Page 59: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 59

Boolean Functions

Note: no constants for true/false. use functions true() and false() instead.

Boolean operators: and, or a and b or c means (a and b) or c

functions: not(-), true(), false() fn:not(0) = fn:true() = fn:not( (0)) fn:not(fn:true()) = fn:false() fn:not("") = fn:true() fn:not((1)) = fn:false() = fn:not(2)

Notes: 0,“” , have effect boolean value false. (1) has effect boolean value true.

Page 60: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 60

Effect boolean values ( = fn:boolean(s) )

The following values are interpreted as true: boolean true non-empty string non-zero number a sequence whose first item is a node

The following values are interpreted as false: boolean false, empty string, 0, 0.0 or NaN, () // empty

sequence All other cases are type error.

Usage Used in : and, or, not(.), E1[E2], if, some, every, (>,<,=,…;1.0) Not used in : xs:boolean(.), . cast as xs:bool, pass value to

xs:boolean arg.

Examples: (2,3) or (4,5)runtime error; (/ , 2) true ; (2, //e) error 2 and “” false ; (2) and (3) true (why?)

Page 61: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 61

String Functions

fn:concat("X","ML") = "XML"fn:concat("X","ML"," ","book") = "XML book"fn:string-join(("XML","book")," ") = "XML book"fn:string-join(("1","2","3"),"+") = "1+2+3"fn:substring("XML book",5) = "book"fn:substring("XML book",2,4) = "ML b"fn:string-length("XML book") = 8fn:upper-case("XML book") = "XML BOOK"fn:lower-case("XML book") = "xml book”

fn:translate("bar","abc","ABC") = "BAr"fn:translate("--aaa--","abc-","ABC") = "AAA".fn:translate("abcdabc", "abc", "AB") = "ABdAB".

Page 62: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 62

Regexp Functions

fn:contains("XML book","XML") = fn:true()fn:matches("XML book","XM..[a-z]*") = fn:true()fn:matches("XML book",".*Z.*") = fn:false()fn:replace("XML book","XML","Web") = "Web book"fn:replace("XML book","[a-z]","8") = "XML 8888"

Page 63: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 63

Cardinality Functions on sequence

fn:exists(()) = fn:false()fn:exists((1,2,3,4)) = fn:true()fn:empty(()) = fn:true()fn:empty((1,2,3,4)) = fn:false()fn:count((1,2,3,4)) = 4fn:count(//rcp:recipe) = 5

Page 64: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 64

Sequence Functions

fn:distinct-values((1, 2, 3, 4, 3, 2)) = (1, 2, 3, 4)

fn:insert-before((2, 4, 6, 8), 2, (3, 5)) = (2, 3, 5, 4, 6, 8) (: 2 is the position:)fn:remove((2, 4, 6, 8), 3) = (2, 4, 8)fn:reverse((2, 4, 6, 8)) = (8, 6, 4, 2)fn:subsequence((2, 4, 6, 8, 10), 2) = (4, 6, 8, 10)

fn:subsequence((2, 4, 6, 8, 10), 2, 3) = (4, 6, 8)

Page 65: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 65

Aggregate Functions

fn:avg((2, 3, 4, 5, 6, 7)) = 4.5

fn:max((2, 3, 4, 5, 6, 7)) = 7

fn:min((2, 3, 4, 5, 6, 7)) = 2

fn:sum((2, 3, 4, 5, 6, 7)) = 27

fn:count((2, 3, 4, 5, 6, 7)) = 6

Page 66: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 66

Node Functions

fn:doc("http://www.brics.dk/ixwt/examples/recipes.xml")

fn:position()

fn:last()

Page 67: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 67

Coercion Functions

xs:integer("5") = 5 or "5" cast as xs:integerxs:integer(7.0) = 7 7.0 cast as xs:integerxs:decimal(5) = 5.0xs:decimal("4.3") = 4.3xs:decimal("4") = 4.0xs:double(2) = 2.0E0xs:double(14.3) = 1.43E1xs:boolean(0) = fn:false()xs:boolean("true") = fn:true()xs:string(17) = "17"xs:string(1.43E1) = "14.3"xs:string(fn:true()) = "true" castableif(12345678901 castable as xs:int ) then 12345678901 cast as xs:int else 12345678901 cast as xs:long

Page 68: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 68

For Expressions

The expressionfor $r in //rcp:recipe

return fn:count($r//rcp:ingredient[fn:not(rcp:ingredient)])

returns the value11, 12, 15, 8, 30

The expressionfor $i in (1 to 5) for $j in (1 to $i) return $j

returns the value1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5

Page 69: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 69

Conditional Expressions (IfThenElse)

fn:avg( for $r in //rcp:ingredient return if ( $r/@unit = "cup" ) then xs:double($r/@amount) * 237 else if ( $r/@unit = "teaspoon" ) then xs:double($r/@amount) * 5 else if ( $r/@unit = "tablespoon" ) then xs:double($r/@amount) * 15 else ())

Page 70: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 70

Quantified Expressions

form: ( some | every ) $var1 in Expr1 ,…,$varn in Exprn … satisfies Expr

a boolean exprEx: some $r in //rcp:ingredient satisfies $r/@name eq "sugar"

fn:exists( for $r in //rcp:ingredient return if ($r/@name eq "sugar") then fn:true() else ()

)

Page 71: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 71

Expressions on sequence types

Expressions on SequenceTypes

1. Instance Of2. Cast3. Castable4. Constructor Functions5. Treat

Sequence type is used to refer to the type of an XPath expression whose

value is always a sequence. syntax given in SequenceType .

Page 72: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 72

sequence type syntax

sequence type empty-sequence() item-type (? | + | * ) ?

item-type atomic-type item() kind-test

atomic-type any QName // xs:int, my:type kind-test

Page 73: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 73

kind-test

generic cases : AnyKindTest node() // any node DocumentTest document-node(), … // any doc ElementTest element(), … // any element AttributeTest attribute( ), … // any attribute PITest processing-instruction() // any PI CommentTest comment() // any comment TextTest text() // any text node

ex: //sale treated as element()* (//sale, 2) treated as item()+

Page 74: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 74

kind-test

Specialized cases: DocumentTest document-node( RootElementTest )

document-node(element(book, bookType) ) // root element is a book

ElementTest element( ElementNameOr* [,typeName [?]])element(*,xs:int), element(p:e1), element(bk:book, bk:bookType?)element(bk:book, bk:bookType) // @xsi:type derives from or is bookType

// and nilled(.) must be false.

AttributeTest attribute( AttrNameOr* [,typeName] )attribute(*, my:type), attribute(my:attr1), attribute(age, xs:int)

SchemaElementTest schema-element(QName)QName is the qualified name of a declared element.

SchemaAttributeTest schema-attribute(QName)QName is the qualified name of a declared element.

PITest processing-instruction([ NCName | string ])

Page 75: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 75

Type conversion in XPath

In XPath2.0 there are two operators for type conversions: V cast as AT // change V to a value of atomic type AT V treat as ST // assume V is of sequence type ST (at

static time) and raise runtime error if not (like ()Obj in Java).

Ex: xs:int(2) cast as xs:double // may require value conversion 2 cast as xs:int // ok!

2 treat as ? // no value conversion

ok: xs:integer, xs:decimal, xs:integer+, xs:integer* (since 2 is of type xs:integer,and all others are derived from xs:integer)

runtime error: xs:int, xs:string (since xs:integer is not a derived type of xs:int or xs:string).

Page 76: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 76

Sequenctype expressions InstanceofExpr ::= TreatExpr instance of sequencType

5 instance of xs:integer, 5 instance of xs:decimal (6,5) instance of xs:integer+ . instance of element()

CastExpr    ::=    UnaryExpr [ cast as [ atomicType] ] (2,3) cast as xs:double+ (x) // must be atomicType 2 cast as xs:float

CastableExpr  ::=    CastExpr [ castable as [ atomicType] ] (2,3) castable as xs:double+ (x) 2 castable as xs:double? true ; "abc" castable as xs:int false

TreatExpr    ::=    CastableExpr [ treat as SequenceType ] ex: @addr treat as attribute(*, USAddress ) change the declared(static) type of @addr to USAddress. During evaluation, if the actual (dynamic) type is not error

Page 77: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 77

XPath 1.0 Restrictions

Many implementations only support XPath 1.0 Smaller function library Implicit casts of values Some expressions change semantics:

”4” < ”4.0” : false in XPath 1.0 but true in XPath 2.0 2 = "2" : true in 1.0 but type error in 2.0

Page 78: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 78

XPointer

A fragment identifier mechanism based on XPath Different ways of pointer to the fourth recipe:

...#xpointer(//recipe[4])

...#xpointer(//rcp:recipe[./rcp:title ='Zuppa Inglese'])

...#element(/1/5)

...#r102

Page 79: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 79

Expression Hierarchy (1.0)

PrimaryExpr (Expr), funCall, number, literal, varReference (Expr), f(a,b,c), 2.3, “abc”, $pre

FilterExpr PrimaryExpr pred* $ns[@name=‘abc’]

PathExpr FilterExpr / LP FilterExpr // LP LP $ns[@name=‘abc’] //author[2]

UnionExpr PathExpr | PathExpr UnaryExpr - UnionExpr MultiplicativeExpr *, div, mod, AdditiveExpr +, - RelationalExpr <, <=, >, >= EqualityExpr =, != AndExpr and OrExpr or Expr OrExpr

Page 80: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 80

Expression Hierarchy (2.0)

PrimaryExpr (Expr?), funCall, numberOrStringLiteral, varRef, cxtItemExpr (Expr), (), f(a,b,c), 2.3, “abc”, $xyz, .

StepExpr ::= (PrimaryExpr | AxisStep) Pred* $x [@name eq ‘abc’], pre:e1[@name][2]

RelativePathExpr ::= StepExpr ((‘/’ | ‘//’ ) StepExpr )* $ns[@name=‘abc’] //author[2] /@name

PathExpr ::=(“/”?|‘//’)RelativePathExpr|RelativePathExpr ValueExpr ::= PathExpr UnaryExpr ::=(‘+’ |’ –’ )* ValueExpr CastExpr ::= UnaryExpr (‘cast’ ‘as’ AtomicType ‘?’)?

/bk:books[2]/@name cast as xs:string () cast as xs:int?

Page 81: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 81

CastableExpr ::= CastExpr (‘castable’ ‘as‘ AtomicType ‘?’ )? if ($x castable as my:type) then $x cast as my:type else $x cast as xs:string

TreatExpr ::= CatableExpr (‘treat’ ‘as’ sequenceType )? $add treat as element(*, USAddress) static type of $addr may be element(*, Address), but require it to be

element(*, USAddress) at runtime. o/w dynamic error

instanceOfExpr ::= TreatExpr (‘instacne’ ‘of’ sequencType )? IntersectExpr ::= instanceOfExpr ( (‘insersect’ | ‘except’ )

instacneOfExpr)* unionExpr ::= intersectExpr ( (‘union’ | ‘|’ ) intersectExpr)*

Page 82: Introduction to XML Path Language (XPath20)

Introduction to XPath

Transparency No. 82

MultiplicativeExpr *, div, idiv, mod, 5 div 2 * 3

AdditiveExpr +, - 2 + 3 - 4

RangeExpr ::= AdditiveExpr (to AdditiveExpr)? 3 to 100

ComparisonExpr ::= RangeExpr ( (NodeCmp | ValueCmp | GeneralCmp ) RangeExpr )?

AndExpr and OrExpr or ExprSingle ::= OrExpr | IfExpr | ForExpr | QuantifiedExpr Expr ExprSingle (‘,’ ExprSingle)* XPath ::= Expr