XML Data Management
8. XQuery
Werner Nutt
Requirements for an XML Query Language
David Maier, W3C XML Query Requirements:
• Closedness: output must be XML
• Composability: wherever a set of XML elements is required, a
subquery is allowed as well
• Support for key operations:– selection
– extraction, projection
– restructuring
– combination, join
– fusion of elements
Requirements for an XML Query Language
• Can benefit from a schema,
but should also be applicable without
• Retains the order of nodes
• Formal semantics:
– structure of results should be derivable from query
– defines equivalence of queries
• Queries should be representable in XML
documents can have embedded queries
How Does One Design a Query Language?
• In most query languages, there are two aspects to a
query:
– Retrieving data (e.g., from … where … in SQL)
– Creating output (e.g., select … in SQL)
• Retrieval consists of
– Pattern matching (e.g., from … )
– Filtering (e.g., where … )
… although these cannot always be clearly distinguished
XQuery Principles
• Data Model identical with the XPath data model
– documents are ordered, labeled trees
– nodes have identity
– nodes can have simple or complex types
(defined in XML Schema)
• A query result is an ordered list/sequence of items
(nodes, values, attributes, etc., but not lists)
– special case: the empty list ()
XQuery Principles (cntd)
• XQuery can be used without schemas,
but can be checked against DTDs and XML schemas
• XQuery is a functional language
– no statements
– evaluation of expressions
– function definitions
– modules
The Recipes DTD (Reminder)
<!ELEMENT recipes (recipe*)>
<!ELEMENT recipe (title, ingredient+, preparation, nutrition)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT ingredient (ingredient*, preparation?)>
<!ATTLIST ingredient
name CDATA #REQUIRED
amount CDATA #IMPLIED
unit CDATA #IMPLIED>
<!ELEMENT preparation (step+)>
<!ELEMENT step (#PCDATA)>
<!ELEMENT nutrition EMPTY>
<!ATTLIST nutrition
calories CDATA #REQUIRED
fat CDATA #REQUIRED>
<titles>
{for $r in doc("recipes.xml")//recipe
return
$r/title}
</titles>
returns
<titles>
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
<title>Ricotta Pie</title>
…
</titles>
A Query over the Recipes Document
Query Features
XPath
<titles>
{for $r in doc("recipes.xml")//recipe
return
$r/title}
</titles>
doc(String) returns input document
Part to be returned as it is given {To be evaluated}
Iteration $var - variables
Sequence of results,one for each variable binding
An Equivalent Stylesheet Template
<xsl:template match="/">
<titles>
<xsl:for-each select="//recipe">
<xsl:copy-of select="title"/>
</xsl:for-each>
</titles>
</xsl:template>
Features: Summary
• The result is a new XML document
• A query consists of parts that are returned as is
• ... and others that are evaluated (everything in {...} )
• Calling the function doc(String) returns
an input document
• XPath is used to retrieve node sets and values
• Iteration over node sets:
for binds a variable to all nodes in a node set
• Variables can be used in XPath expressions
• return returns a sequence of results,
one for each binding of a variable
XPath is a Fragment of XQuery
• doc("recipes.xml")//recipe[1]/title
returns
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
• doc("recipes.xml")//recipe[position()<=3]
/title
returns
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>,
<title>Ricotta Pie</title>,
<title>Linguine alla Pescadora</title>
an element
a list of elements
Beware: Attributes in XPath
• doc("recipes.xml")//recipe[1]/ingredient[1]
/@name
→ attribute name {"beef cube steak"}
• string(doc("recipes.xml")//recipe[1]
/ingredient[1]/@name)
→ "beef cube steak"
an attribute, represented as a constructor
for an attribute node (not in Saxon)
a value of type string
Beware: Attributes in XPath (cntd.)
• <first-ingredient>
{string(doc("recipes.xml")//recipe[1]
/ingredient[1]/@name)}
</first-ingredient>
→ <first-ingredient>beef cube steak</first-ingredient>
an element with string content
Beware: Attributes in XPath (cntd.)
• <first-ingredient>
{doc("recipes.xml")//recipe[1]
/ingredient[1]/@name}
</first-ingredient>
→ <first-ingredient name="beef cube steak"/>
an element with an attribute
• Note: The XML that we write down is only the surface structure
of the data model that is underlying XQuery
Beware: Attributes in XPath (cntd.)
• <first-ingredient
oldName="{doc("recipes.xml")//recipe[1]
/ingredient[1]/@name}">
Beef
</first-ingredient>
→ <first-ingredient oldName="beef cube steak">
Beef
</first-ingredient>
An attribute is cast as a string
Constructor Syntax
For all constituents of documents, there are constructors
element first-ingredient
{
attribute oldName
{string(doc("recipes.xml")//recipe[1]
/ingredient[1]/@name)},
"Beef"
}
equivalent to the notation on the previous slide
attribute constructor
element constructor
Iteration with the For-Clause
Syntax: for $var in xpath-expr
Example: for $r in doc("recipes.xml")//recipe
return string($r)
• The expression creates a list of bindings for a variable $var
If $var occurs in an expression exp,
then exp is evaluated for each binding
• For-clauses can be nested:
for $r in doc("recipes.xml")//recipefor $v in doc("vegetables.xml")//vegetable return ...
What Does This Return?
for $i in (1,2,3)
for $j in (1,2,3)
return
element {concat("x",$i * $j)}
{$i * $j}
Nested For-clauses: Example
<my-recipes>
{for $r in doc("recipes.xml")//recipe
return
<my-recipe title="{$r/title}">
{for $i in $r//ingredient
return
<my-ingredient>
{string($i/@name)}
</my-ingredient>
}
</my-recipe>
}
</my-recipes>
Returns my-recipes with titles as attributes and my-ingredientswith names as text content
The Equivalent Stylesheet Template
<xsl:template match="/">
<my-recipes>
<xsl:for-each select=".//recipe">
<my-recipe title="{title}">
<xsl:for-each select="ingredient">
<my-ingredient>
<xsl:value-of select="@name"/>
</my-ingredient>
</xsl:for-each>
</my-recipe>
</xsl:for-each>
</my-recipes>
</xsl:template>
The Let Clause
Syntax: let $var := xpath-expr
• binds variable $var to a list of nodes,
with the nodes in document order
• does not iterate over the list
• allows one to keep intermediate results for reuse
(not possible in SQL)
Example:
let $oorecps := doc("recipes.xml")//recipe
[.//ingredient/@name="olive oil"]
Let Clause: Example
<calory-content>
{let $oorecps := doc("recipes.xml")//recipe
[.//ingredient/@name="olive oil"]
for $r in $oorecps return
<calories>
{$r/title/text()}
{": "}
{string($r/nutrition/@calories)}
</calories>}
</calory-content> Calories of recipeswith olive oil
Note the implicitstring concatenation
Let Clause: Example (cntd.)
The query returns:
<calory-content>
<calories>Beef Parmesan: 1167</calories>
<calories>Linguine alla Pescadora: 532</calories>
</calory-content>
The Where Clause
Syntax: where <condition>
• occurs before return clause
• similar to predicates in XPath
• comparisons on nodes:
“=“ for node equality
“<<“ and “>>” for document order
• Example:
for $r in doc("recipes.xml")//recipe where $r//ingredient/@name="olive oil"return ...
Quantifiers
• Syntax:
some/every $var in <node-set>
satisfies <expr>
• $var is bound to all nodes in <node-set>
• Test succeeds if <expr> is true for some/every
binding
• Note: if <node-set> is empty, then
“some” is false and “all” is true
Quantifiers (Example)
• Recipes that have some compound ingredient
• Recipes where every top level ingredient is non-compound
for $r in doc("recipes.xml")//recipewhere some $i in $r/ingredient satisfies $i/ingredient return $r/title
for $r in doc("recipes.xml")//recipewhere every $i in $r/ingredient satisfies not($i/ingredient) return $r/title
Element Fusion
“To every recipe, add the attribute calories!”
<result>
{let $rs := doc("recipes.xml")//recipe
for $r in $rs return
<recipe>
{$r/nutrition/@calories}
{$r/title}
</recipe>}
</result>
an element
an attribute
Element Fusion (cntd.)
The query result:
<result>
<recipe calories="1167">
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
</recipe>
<recipe calories="349"><title>Ricotta Pie</title></recipe>
<recipe calories="532"><title>Linguine Pescadoro</title></recipe>
<recipe calories="612"><title>Zuppa Inglese</title></recipe>
<recipe calories="8892">
<title>Cailles en Sarcophages</title>
</recipe>
</result>
Fusion with Mixed Syntax
We mix constructor and XML–Syntax:
element result
{let $rs := doc("recipes.xml")//recipe
for $r in $rs return
<recipe>
{attribute calories {$r/nutrition/@calories}}
{$r/title}
</recipe>}
The Same with Constructor Syntax Only
element result
{let $rs := doc("recipes.xml")//recipe
for $r in $rs return
element recipe
{
attribute calories {$r/nutrition/@calories},
$r/title
}
}
Join condition
“Pair every ingredient with the recipes where it is used!”
let $rs := doc("recipes.xml")//recipe
for $i in $rs//ingredient
for $r in $rs
where $r//ingredient/@name=$i/@name
return
<usedin>
{$i/@name}
{$r/title}
</usedin>
Join
Join (cntd.)
The query result:
<usedin name="beef cube steak">
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
</usedin>,
<usedin name="onion, sliced into thin rings">
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
</usedin>,
<usedin name="green bell pepper, sliced in rings">
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
</usedin>
Join Exercise
Return all pairs of ingredients such that
• the ingredients have the same name,
• but occur with different amounts
and return
• the recipes where each of them is used
• together with the amount being used in those recipes,
while returning every pair only once.
Could a query for these ingredients be expressed in XPath?
Join condition
Document Inversion
“For every ingredient, return all the recipes where it is used!”
<result>
{let $rs := doc("recipes.xml")//recipe
for $i in $rs//ingredient
return
<ingredient>
{$i/@*}
{$rs[.//ingredient/@name=$i/@name]/title}
</ingredient>}
</result>
Document Inversion (cntd.)
The query result:
<result>
<ingredient amount="1" name="Alchermes liquor" unit="cup">
<title>Zuppa Inglese</title>
</ingredient>
…
<ingredient amount="2" name="olive oil" unit="tablespoon">
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
<title>Linguine Pescadoro</title>
</ingredient>
…
Eliminating Duplicates
The function distinct-values(Node Set)
– extracts the values of a sequence of nodes
– creates a duplicate free list of values
Note the coercion: nodes are cast as values!
Example:
let $rs := doc("recipes.xml")//recipe
return distinct-values($rs//ingredient/@name)
yields
xdt:untypedAtomic("beef cube steak"),
xdt:untypedAtomic("onion, sliced into thin rings"),
...
by the Galaxengine
Avoiding Multiple Results in a Join
We want that every ingredient is listed only once:
Eliminate duplicates using distinct-values!
<result>
{let $rs := doc("recipes.xml")//recipe
for $in in distinct-values(
$rs//ingredient/@name)
return
<recipes with="{$in}">
{$rs[.//ingredient/@name=$in]/title}
</recipes> }
</result>
Avoiding Multiple Results (cntd.)
The query result:
<result> <recipes with="beef cube steak"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </recipes>
<recipes with="onion, sliced into thin rings"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </recipes>... <recipes with="salt"> <title>Linguine Pescadoro</title> <title>Cailles en Sarcophages</title>
</recipes>
...
Syntax: order by expr [ ascending | descending ]
for $iname in doc("recipes.xml")//@name
order by $iname descending
return string($iname)
yields
"whole peppercorns",
"whole baby clams",
"white sugar",
...
The Order By Clause
The Order By Clause (cntd.)
let $rs := doc("recipes.xml")//@name
for $r in $rs
order by $r/nutrition/@calories
return $r/title
In which order will the titles come?
The Order By Clause (cntd.)
The interpreter must be told whether the values
should be regarded as numbers or as strings
(alphanumerical sorting is default)
for $r in $rs
order by number($r/nutrition/@calories)
return $r/title
Note:
– The query returns titles ...
– but the ordering is according to calories,
which do not appear in the output
Also possible in SQL! What if combined with distinct-values?
FLWOR Expresssions (pronounced “flower”)
We have now seen the main ingredients of XQuery:
• For and Let clauses, which can be mixed
• a Where clause imposing conditions
• an Order by clause, which determines the order of results
• a Return clause, which constructs the output.
Combining these yields FLWOR expressions.
Conditionals
if (expr) then expr else expr
Example
let $is := doc("recipes.xml")//ingredient
for $i in $is[not(ingredient)]
let $u := if (not($i/@unit))
then attribute unit {"pieces"}
else ()
creates an attribute unit="pieces" if none exists
and an empty item list otherwise
We use the conditional to construct variants of ingredients:
let $is := doc("recipes.xml")//ingredient
for $i in $is[not(ingredient)]
let $u := if (not($i/@unit))
then attribute {"unit"} {"pieces"}
else ()
return
<ingredient>
{$i/@* | $u}
</ingredient>
Conditionals (cntd.)
Collects all attributes in a list and adds a unitif needed
Conditionals (cntd.)
The query result:
<ingredient name="beef cube steak" amount="1.5"
unit="pound"/>,
...
<ingredient name="eggs" amount="12"
unit="pieces"/>,
…
Exercises
Write queries that produce
• A list, containing for every recipe the recipe's title element
and an element with the number of calories
• The same, ordered according to calories
• The same, alphabetically ordered according to title
• The same, ordered according to the fat content
• The same, with title as attribute and calories as content.
• A list, containing for every recipe the top level ingredients,
dropping the lower level ingredients
Sample Solution 1
A list, containing for every recipe the recipe's title element
and an element with the number of calories
<result>
{for $r in doc("recipes.xml")//recipe
return
($r/title,
<calories>
{number($r//@calories)}
</calories>)
}
</result>
The results returned are 2-element lists.
The list constructor is“( . , . )”
Sample Solution 6
<results> {for $r in doc("recipes.xml")//recipe return <recipe> {attribute title {$r/title}, for $i in $r/ingredient return if (not($i/ingredient)) then $i else <ingredient> {$i/@*} </ingredient> } </recipe> }</results>
Aggregation
Aggregation functions count, sum, avg, min, max
Example: The number of recipes with olive oil
let $doc := doc("recipes.xml”)
return
<number>
{count($doc//recipe
[.//ingredient/@name = "olive oil"])}
</number>
Grouping and Aggregation
For each recipe, the number of simple ingredients
for $r in doc("recipes.xml")//recipe
return
<number>
{attribute title {$r/title/text()}}
{count($r//ingredient[not(ingredient)])}
</number>
Grouping and Aggregation (cntd.)
The query result:
<number title="Beef Parmesan with Garlic Angel Hair Pasta">
11</number>,
<number title="Ricotta Pie">12</number>,
<number title="Linguine Pescadoro">15</number>,
<number title="Zuppa Inglese">8</number>,
<number title="Cailles en Sarcophages">30</number>
Grouping and Aggregation (cntd.)
A list, containing for every ingredient,
the number of occurrences of that ingredient
let $d := doc("recipes.xml")
let $is := distinct-values($d//ingredient/@name)
return
<result>
{for $i in $is
order by $i
return
<ingredient name="{$i}">
{count($d//ingredient[@name=$i])}
</ingredient>}
</result>
Nested Aggregation
“The recipe with the maximal number of calories!”
let $rs := doc("recipes.xml")//recipe
let $maxCal := max($rs//@calories)
for $r in $rs
where $r//@calories = $maxCal
return string($r/title)
returns
"Cailles en Sarcophages"
User-defined Functions
declare function local:fac($n as xs:integer)
as xs:integer
{
if ($n = 0)
then 1
else $n * local:fac($n - 1)
};
local:fac(10)
FunctionDeclaration
FunctionCall
Example: Nested Ingredients
declare function
local:nest($n as xs:integer, content as xs:string)
as element()
{
if ($n = 0)
then element ingredient{$content}
else element ingredient{local:nest($n - 1,$content)}
};
local:nest(3,"Stuff")
What Does this Function Return?
declare function local:depth($n as node())
as xs:integer
{
if (fn:empty($n/*))
then 1
else let $cdepths
:= for $c in $n/* return local:depth($c)
return fn:max($cdepths) + 1
};
Exercise
Write a function
local:element-copy
that
• takes as input a node (= XML tree)
• produces as output a copy of the tree,
but without the attributes