Upload
gavin-marcus-bates
View
232
Download
1
Tags:
Embed Size (px)
Citation preview
Introduction to XQuery
Bun YueProfessor, CS/CISUHCL
W3C Recommendations http://www.w3.org/TR/xquery/: W3C XQuery
http://www.w3.org/TR/xmlquery-use-cases: XQuery use cases.
http://www.w3.org/TR/xquery-operators/: XQuery and XPath functions.
http://www.w3.org/TR/xpath-datamodel/: XQuery 1.0 and XPath 2.0 Data Model.
http://www.w3.org/TR/xpath20/: XPath 2.0. http://www.w3.org/TR/xmlschema-1/: XML Schema
Part 1: Structures. http://www.w3.org/TR/xmlschema-2/: XML Schema
Part 2: datatypes.
Introduction
XQuery is designed for effectively query and retrieve information from a diversified XML sources.
The XML sources can be one or more XML documents.
XQuery is derived from Quilt, and has borrowed features from XPath, XQL, SQL, etc.
Introduction It is a functional language where a
query is an expression. There are three faces of the XQuery
languages: A "surface" syntax that programmers
may probably use. An XML-based syntax that machine may
probably use (XQueryX). A formal semantic that XQuery engine
implementators use.
Introduction. XQuery 1.0 extends XPath 2.0. The type system of XQuery is based
on XML Schema. A limitation of XQuery:
No update or insert. The basic building block of XQuery is
expressions. (In this sense, like SQL, XQuery is not a full programming language.)
Comparing to SQL
Relational DB: SQL
XML DB: XQuery
Basic units relations collections
Records tuples or rows of schema
documents of same schema
Schema Relational Schema
DTD, XML Schema
Query results Relations: unordered list of rows
Ordered sequences of nodes.
Review of XPath 2.0 The value of an expression is a sequence,
which is an ordered list of items. An item can be a node or of atomic value. There are 7 node types:
Document Element Attribute Comment Text Processing Instruction Namespace
XQueryXFor doc("census.xml")//person[@job="Athlete"]the corresponding XQueryX can be:
<?xml version="1.0"?><q:query xmlns:q="http://www.w3.org/2001/06/xqueryx"> <q:step q:axis="descendant-or-self"> <q:function q:name="document"> <q:constant q:datatype="xs:string">census.xml</q:constant> </q:function> <q:predicatedExpr> <q:identifier>person</q:identifier> <q:predicate> <q:function q:name="equals"> <q:step q:axis="attribute"> <q:identifier>job</q:identifier>
</q:step> <q:constant q:datatype="xs:string">Athlete</q:constant> </q:function> </q:predicate>
</q:predicatedExpr> </q:step></q:query>
Data Types
XQuery is strongly typed. XQuery types are based on
XML Schema: using the namespace prefix xs and url: http://www.w3.org/2001/XMLSchema.
XPath functions and operators: using the namespace prefix xdt and url: http://www.w3.org/2004/07/xpath-datatypes
Types
Types
xdt:untyped is used to denote element nodes not yet validated.
xdt:untypedAtomic is used to denote atomic types that has not been assigned a more specific type.
Query
A query in XQuery is an expression for reading XML documents or fragments
and returning a sequence of well-formed XML
fragments
Everything in XQuery is an expression that is evaluated to a value.
Query expressions Some common forms of XQuery
expressions are (these appear in most tutorials): path expressions element constructors FLWR or FLOWR (pronounced as "flower")
expressions list expressions conditional expressions quantified expressions datatype expressions
More Queries Examples of other expressions
include: primary expressions sequence expressions arithmetic expressions logical expressions comparison expressions sorting expressions validate expressions
Comments
XQuery comments are embedded within (: and :).
Functions
Supports a collection of about 200 built-in operators and functions to be used within expressions.
Input functions in XQuery include doc() and collection(). They are used to identify the sources of the XML documents.
Input Functions
Input functions: doc() collection().
Prolog
XQuery may have prologs for declarations. Examples: Variable declarations Function declarations Base-URI declarations Version declarations Module import …
Variable Declarations
Format: declare variable $name = expression;
E.g.
declare variable $a := doc("census.xml")//person ;
Path Expressions
XQuery 1.0 is a superset of XPath 2.0. An XPath expression is also an
XQuery expression
Editix
Use “View > Windows > XQuery Builder”
For XQ files, use “XSLT/XQuery > Transform using an XQuery Request…” Specify source xq file, xml file and output
file. Use .xml extension. If you use .txt
extension, only text node contents are output.
Examples
declare base-uri "whatever-path";doc("bib.xml")/*
Return basically bib.xml.
Example
doc("bib.xml")//*
Return many nodes (in a sequence).
Results are not well-formed.
Examples
doc("bib.xml")//book[@year]
count(doc("census.xml")//person)
Element Constructors
Element constructors can be used to construct XML elements.
If the name, attributes, and content of the element are all constants, the element constructor is based on standard XML notation and is called a direct element constructor (W3C).
Example
The XQuery<authors><author>Bun Yue</author></authors>returns<authors><author>Bun Yue</author></authors>
Element Constructors
XQuery expressions can be embedded in the direct element constructors within a pair of curly braces, {}.
For the characters '{' and '}', use '{{' and '}}' respectively.
XQuery expressions may be separated by commas.
Example
<authors><author>Bun Yue</author>{ doc("bib.xml")//author }</authors>
Adds Bun Yue to the authors of bib.xml.
Computed Constructors Computed constructors can also be used to
declare nodes: Use the keywords element, attribute, document,
text, processing-instruction, comment, or namespace to declare the type of the nodes.
Specify the node names for those node types with names (element, attribute, processing instruction, and namespace nodes)
Use a pair of braces to define the content expressions.
Note the use of commas to separate expressions in the context.
Example (from W3C)
element book { attribute isbn {"isbn-0060229357" }, element title { "Harold and the Purple
Crayon"}, element author { element first { "Crockett" },
element last {"Johnson" } }}
Example (result)
<book isbn="isbn-0060229357"> <title>Harold and the Purple Crayon</title> <author> <first>Crockett</first> <last>Johnson</last> </author>
</book>
Dynamic Element Names
Computed expressions can be used to create elements with dynamic names.
Example
<result>{ for $author in doc("bib.xml")//author return element {$author/last/text()} { $author/first }}</result>
Example Result<?xml version="1.0" encoding="UTF-8"?><result> <Stevens> <first>W.</first> </Stevens> <Stevens> <first>W.</first> </Stevens> <Abiteboul> <first>Serge</first> </Abiteboul> <Buneman> <first>Peter</first> </Buneman> <Suciu> <first>Dan</first> </Suciu></result>
Example Note that <first> is a child element. See
the difference of:<result>{ for $author in doc("bib.xml")//author return element {$author/last/text()} { $author/first/text() }}</result>
Example
This example may also result in a runtime error (as the value of <last> may not be suitable for a QName.
FLWOR expressions
FLWOR expressions are one of the most important constructs in XQuery.
You may compare with the SELECT statement of SQL.
FLWOR (W3C)[42] FLWORExpr ::= (ForClause | LetClause)+ WhereClause?
OrderByClause? "return" ExprSingle[43] ForClause ::= "for" "$" VarName TypeDeclaration?
PositionalVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle)*
[45] LetClause ::= "let" "$" VarName TypeDeclaration? ":=" ExprSingle ("," "$" VarName TypeDeclaration? ":=" ExprSingle)*
[123] TypeDeclaration ::= "as" SequenceType[44] PositionalVar ::= "at" "$" VarName[46] WhereClause ::= "where" Expr[47] OrderByClause ::= ("order" "by" | "stable" "order" "by")
OrderSpecList [48] OrderSpecList ::= OrderSpec ("," OrderSpec)*[49] OrderSpec ::= ExprSingle OrderModifier[50] OrderModifier ::= ("ascending" | "descending")? (("empty"
"greatest") | ("empty" "least"))? ("collation" StringLiteral)?
FLWOR FLWOR expressions allow:
For: Iteration through items in XPath 2.0 sequences. Create a tuple stream where each tuple contains a distinct binding for each variable to a distinct value.
Let: Variables binding Where: Predicate application for inclusion in the
iteration. Order by: Ordering data set for the iteration. Return: Constructing new result for returning.
For and Let
The for and let clauses produces a tuple stream.
A tuple consists of one or more bound variables.
A variable begins with the prefix $. A bound variable is one that has been
assigned a value.
Example
declare base-uri “whatever”;let $a := doc("bib.xml")//authorreturn<authors> { $a }</authors>
Example Results<?xml version="1.0" encoding="UTF-8"?><authors> <author> <last>Stevens</last> <first>W.</first> </author> <author> <last>Stevens</last> <first>W.</first> </author>…</authors>
Example Note
In this example: The tuple stream is composed of only
one tuple. The variable $b in this tuple is bound to
the node sequence of 5 <author> nodes.
Example
for $a in doc("bib.xml")//authorreturn<authors> { $a }</authors>
Example Result<?xml version="1.0" encoding="UTF-8"?><authors> <author> <last>Stevens</last> <first>W.</first> </author></authors><authors> <author> <last>Stevens</last> <first>W.</first> </author></authors>…</authors>
Example Notes
In this example: The tuple stream is composed of only
five tuples. The variable $b in this tuple is bound to
one <author> node at a time.
Example
for $a in doc("bib.xml")//author, $b in doc("bib.xml")//authorreturn <count/>
Example Result
<?xml version="1.0" encoding="UTF-8"?>
<count/><count/><count/><count/>… (: 25 counts :)
Example Note The tuple stream is composed of only 25 tuples. The 25 tuples are:
($a: <author><last>Stevens</last><first>W.</first></author>, $b: <author><last>Stevens</last><first>W.</first></author>)
($a: <author><last>Stevens</last><first>W.</first></author>, $b: <author><last>Stevens</last><first>W.</first></author>)
($a: <author><last>Stevens</last><first>W.</first></author>, $b: <author><last>Abiteboul</last><first>Serge</first></author>)
…
Example
for $a in doc("bib.xml")//author, $b in $a/lastreturn <count />
Example Result
<?xml version="1.0" encoding="UTF-8"?>
<count/><count/><count/><count/><count/>
Example Note The tuple stream is composed of only 5 tuples. The 5 tuples are:
($a: <author><last>Stevens</last><first>W.</first></author>, $b: <last>Stevens</last)
($a: <author><last>Stevens</last><first>W.</first></author>, $b: <last>Stevens</last)
($a: <author><last>Abiteboul</last><first>Serge</first></author>, $b: <last>Abiteboul</last)
…
Example
for $a in doc("bib.xml")//author, $b in doc("bib.xml")//authorwhere $a = $breturn <result><alast>{ $a/last/text()
}</alast><blast>{ $b/last/text() }</blast></result>
Example Result<?xml version="1.0" encoding="UTF-8"?><result> <alast>Stevens</alast> <blast>Stevens</blast></result>… (: three more times. :)<result> <alast>Abiteboul</alast> <blast>Abiteboul</blast></result><result> <alast>Buneman</alast> <blast>Buneman</blast></result><result> <alast>Suciu</alast> <blast>Suciu</blast></result>
Example Note The tuple stream is composed of only 7 tuples. The 7 tuples are:
($a: <author><last>Stevens</last><first>W.</first></author>, $b: <author><last>Stevens</last><first>W.</first></author>) (: 4 times)
($a: <author><last>Abiteboul</last><first>Serge</first></author>, $b: <author><last>Abiteboul</last><first>Serge</first></author>)
…
Example
<figlist> {for $f in doc("tree-data.xml")//figure return <diagram> { $f/@* } { $f/title } </diagram>}</figlist>
Example Result<?xml version="1.0" encoding="UTF-8"?><figlist> <diagram height="400" width="400"> <title>Traditional client/server architecture</title> </diagram> <diagram height="200" width="500"> <title>Graph representations of structures</title> </diagram> <diagram height="250" width="400"> <title>Examples of Relations</title> </diagram></figlist>
Example Note There are three tuples in the tuple stream
of the for clause. Each tuple has one variable: $f, which is bounded to each of the three <figure> elements in the input xml contents respectively.
{ $f/@* } returns the attributes of the original <figure> elements, which will be put as attributes of the output <figure> element.
Example
<authors> { fn:string-join(for $a in doc("tree-
data.xml")//author return $a/text(), ", ") }
</authors>
Example Result
<?xml version="1.0" encoding="UTF-8"?>
<authors>Serge Abiteboul, Peter Buneman, Dan Suciu</authors>
Example Note
fn:string-join takes two arguments: A sequence of string, and A string join separator
Example
<book> {for $f in doc("tree-data.xml")//figure return <figure> { attribute size { $f/@width *
$f/@height } } </figure>}</book>
Example Result
<?xml version="1.0" encoding="UTF-8"?>
<book> <figure size="160000"/> <figure size="100000"/> <figure size="100000"/></book>
Example
<book> { for $f in doc("tree-data.xml")//figure let $size := $f/@width * $f/@height order by $size return <figure> { attribute size { $size } } </figure>}</book>
Example Result
<?xml version="1.0" encoding="UTF-8"?>
<book> <figure size="100000"/> <figure size="100000"/> <figure size="160000"/></book>
Exercise #1 Use bib.xml, Show all books published by Addison
Wesley.<bib>
<book> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> </book> <book> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> </book></bib>
Exercise #2 All books by Addison-Wesley using different
format:<bib>
<book author="W. Stevens"> <name>TCP/IP Illustrated</name> </book> <book author="W. Stevens"> <name>Advanced Programming in the Unix environment</name> </book>
</bib>
Exercise #3
All books written by W. Stevens ordered by years:
<result> <book-title>Advanced Programming
in the Unix environment</book-title> <book-title>TCP/IP Illustrated</book-
title></result>
Exercise #4 All books written by W. Stevens
ordered by years in descending order:<result> <book-title>TCP/IP Illustrated</book-
title> <book-title>Advanced Programming
in the Unix environment</book-title></result>
Exercise #5 Use ft2.xml, return every <person> with its
<first> and <last> child elements. Add a child element <numEmail> to include the number of email addresses.
<result> <person> <first>Boris</first> <last>Becker</last> <numEmail>2</numEmail> </person>…
</result>
Exercise #6 Return all <person> elements with all attributes. The body of the
<person> element should be the name of the person in the format of first name and then last name. For ft2.xml, it returns:
<result> <person ssn="s123456789" gender="M" luckynumber="7">Boris Becker</person> <person ssn="s111222333" gender="F" luckynumber="6">Valerie Becker</person> <person ssn="s123123123" gender="M" luckynumber="4">Chris Becker</person> <person ssn="s222333444" gender="F">Julie Becker</person> <person ssn="s555987323" gender="M">John Becker</person> <person ssn="s887667545" gender="F">Mary Becker</person>
</result>
Exercise #7 Return all pairs of <first> elements of persons with the same last
name, not including pairing with oneself. Each pair of result is embedded in an element with the last name of the persons as the element name. For ft2.xml, it returns:
<result> <Becker><first>Boris</first><first>Valerie</first></Becker> <Becker><first>Boris</first><first>Chris</first></Becker> <Becker><first>Boris</first><first>Julie</first></Becker> <Becker><first>Boris</first><first>John</first></Becker> <Becker><first>Boris</first><first>Mary</first></Becker> <Becker><first>Valerie</first><first>Boris</first></Becker>
…</result>
Exercise #8 Convert all text nodes to <text /> and all
elements with name x to <element name="x" />. For ft2.xml, it returns:
<result> <element name="familytree"/> <text/> <text/> <element name="meta"/>
…</result>
Function Declarations XQuery allows user-defined functions in the
prolog.[26] FunctionDecl ::= "declare" "function"
QName "(" ParamList? ")" ("as" SequenceType)? (EnclosedExpr | "external")
[27] ParamList ::= Param ("," Param)*[28] Param ::= "$" QName
TypeDeclaration?[118] TypeDeclaration ::= "as"
SequenceType
Example: factorial($i)
declare function local:factorial($i as xs:integer) as xs:integer
{ if ($i < 0) then 0 else if ($i = 0) then 1 else $i * local:factorial($i - 1)};local:factorial(6)
Functions There is a ; after the function declaration. The namespace prefix local is used for user-
defined functions. XQuery predefines the namespace prefix local to the namespace http://www.w3.org/2004/07/xquery-local-functions, and reserves this namespace for use in defining local functions.
The types of the arguments and return values should be sequence types.
Types
Sequence type can be: empty(), or ItemType OccurrenceIndicator?
OccurrenceIndicator can be +, ? or *. Item type can be:
item() atomic type, or kind test.
Kind Tests
Important kind tests include node() text() comment() processing-instruction(): with optional
name argument. element test attribute test
Element Tests
Example of element tests are: element(*) element(familytree) element(man, personType)
Functions
Writing XQuery functions: Functional programming. Many are recursive in nature. Beware of types of parameters and
return values.
Example from W3Cdeclare function local:depth($e as node()) as xs:integer{ (: A node with no children has depth 1 :) (: Otherwise, add 1 to max depth of children :) if (fn:empty($e/*)) then 1 else fn:max(for $c in $e/* return local:depth($c)) + 1};<result>{ local:depth(doc("ft2.xml"))}</result>
Exercise #9
Write an XQuery function to count the number of elements in an element node (including itself). Try to use a recursive solution.
Exercise #10
For XML document such as ft2.xml, write a function that returns all child person nodes with parent of social security number $ssn.
Questions