86
Dr. Alexandra I. Cristea http://www.dcs.warwick.ac.uk/ ~acristea/ CS 253: Topics in Database Systems: C2

CS 253: Topics in Database Systems: C2

  • Upload
    tavia

  • View
    36

  • Download
    3

Embed Size (px)

DESCRIPTION

CS 253: Topics in Database Systems: C2. Dr. Alexandra I. Cristea http://www.dcs.warwick.ac.uk/~acristea/. Previously we looked at: XML XSL XSLT Next: XPath XQuery. XPath. XPath. XPath is a syntax for defining parts of an XML document - PowerPoint PPT Presentation

Citation preview

Page 1: CS 253: Topics in Database Systems: C2

Dr. Alexandra I. Cristea

http://www.dcs.warwick.ac.uk/~acristea/

CS 253: Topics in Database Systems: C2

Page 2: CS 253: Topics in Database Systems: C2

• Previously we looked at:– XML– XSL– XSLT

• Next:– XPath– XQuery

Page 3: CS 253: Topics in Database Systems: C2

XPath

Page 4: CS 253: Topics in Database Systems: C2

XPath• XPath is a syntax for defining parts of an XML

document • XPath uses path expressions to navigate in

XML documents • XPath contains a library of standard functions • XPath is a major element in XSLT • XPath is a W3C recommendation, thus a

Standard (16. November 1999 )

Page 5: CS 253: Topics in Database Systems: C2

XPath Path Expressions

• Uses path expressions to select nodes or node-sets in an XML document. – These path expressions look very much

like the expressions you see when you work with a traditional computer file system.

Page 6: CS 253: Topics in Database Systems: C2

XPath Standard Functions

• over 100 built-in functions. – string values, – numeric values, – date and time comparison, – node and QName manipulation, – sequence manipulation, – Boolean values, – and more.

Page 7: CS 253: Topics in Database Systems: C2

XPath Terminology• Nodes

• Atomic values

• Items (atomic values or nodes)

• Relationships of nodes– Parent– Children– Siblings– Ancestors– Descendants

Page 8: CS 253: Topics in Database Systems: C2

XPath Nodes• 7 kinds of nodes:

– element, – attribute, – text, – namespace, – processing-instruction, – comment, and – document (root) nodes.

• XML documents are treated as trees of nodes. The root of the tree is called the document node (or root node).

Page 9: CS 253: Topics in Database Systems: C2

Nodes Examples<?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="en">Harry Potter</title><author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore>

Document node

Element node

Attribute node

Page 10: CS 253: Topics in Database Systems: C2

Atomic values Examples*<?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="en">Harry Potter</title><author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore>

*nodes with no children or parent

Page 11: CS 253: Topics in Database Systems: C2

Selecting nodes

Expression Description

nodename Selects all child nodes of the node

/ Selects from the root node

// Selects nodes in the document from the current node that match the selection no matter where they are

. Selects the current node

.. Selects the parent of the current node

@ Selects attributes

Page 12: CS 253: Topics in Database Systems: C2

Examples of selecting nodesPath Expression Result

bookstore Selects all the child nodes of the bookstore element

/bookstore Selects the root element bookstoreNote: If the path starts with a slash ( / ) it always represents an absolute path to an element!

bookstore/book Selects all book elements that are children of bookstore

//book Selects all book elements no matter where they are in the document

bookstore//book Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element

//@lang Selects all attributes that are named lang

Page 13: CS 253: Topics in Database Systems: C2

Predicates

• Predicates are used to find a specific node or a node that contains a specific value.

• Predicates are always embedded in square brackets.

Page 14: CS 253: Topics in Database Systems: C2

Example predicates

Path Expression Result

/bookstore/book[1] Selects the first book element that is the child of the bookstore element

/bookstore/book[last()] Selects the last book element that is the child of the bookstore element

/bookstore/book[last()-1] Selects the last but one book element thatis the child of the bookstore element

/bookstore/book[position()<3]

Selects the first two book elements that are children of the bookstore element

Page 15: CS 253: Topics in Database Systems: C2

Example predicates – cont. Path Expression Result

//title[@lang] Selects all the title elements that have an attribute named lang

//title[@lang='eng'] Selects all the title elements that have an attribute named lang with a value of 'eng'

/bookstore/book[price>35.00]

/bookstore/book[price>35.00]/title

Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00

Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00

Page 16: CS 253: Topics in Database Systems: C2

Selecting Unknown Nodes

Wildcard Description

* Matches any element node

@* Matches any attribute node

node() Matches any node of any kind

Page 17: CS 253: Topics in Database Systems: C2

Example: selecting several paths

Path Expression Result

//book/title | //book/price Selects all the title AND price elements of all book elements

//title | //price

/bookstore/book/title | //price

Selects all the title AND price elements in the documentSelects all the title elements of the book element of the bookstore element AND all the price elements in the document

Page 18: CS 253: Topics in Database Systems: C2

Location Path Expression• A location path can be absolute or

relative.

• An absolute location path: /step/step/... • A relative location path: step/step/...

• Location step:axisname::nodetest[predicate]

Page 19: CS 253: Topics in Database Systems: C2

XPath Axesself

child parent

ancestor descendant

ancestor-or-self descendant-or-self

preceding-sibling following-sibling

preceding following

attribute

namespace

Page 20: CS 253: Topics in Database Systems: C2

AxisName Result

ancestor Selects all ancestors (parent, grandparent, etc.) of the current node

ancestor-or-self Selects all ancestors (parent, grandparent, etc.) of the current node and the current node itself

attribute Selects all attributes of the current node

child Selects all children of the current node

descendant Selects all descendants (children, grandchildren, etc.) of the current node

descendant-or-self

Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself

following Selects everything in the document after the closing tag of the current node

following-sibling

Selects all siblings after the current node

namespace Selects all namespace nodes of the current node

parent Selects the parent of the current node

preceding Selects everything in the document that is before the start tag ofthe current node

preceding-sibling

Selects all siblings before the current node

self Selects the current node

Page 21: CS 253: Topics in Database Systems: C2

axisname::nodetest[predicate]

• //DDD/parent::*

<AAA>           <BBB>               <DDD>

               </DDD>           </BBB>

</AAA>

Page 22: CS 253: Topics in Database Systems: C2

axisname::nodetest[predicate]

• //BBB/child::*

<AAA>           <BBB>               <DDD>

               </DDD>           </BBB>

</AAA>

Note: /AAA is equivalent to /child::AAA

Page 23: CS 253: Topics in Database Systems: C2

More examples• http://www.zvon.org/xxl/XPathTutorial/Genera

l/examples.html– Check basics, //, *, predicates, attributes, functions

(new ones: count, name, normalize-space, starts-with, contains, string-length, floor, ceiling), axes, operators (mod)

– Note: The ancestor, descendant, following, preceding and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap and together they contain all the nodes in the document. (see example)

Page 24: CS 253: Topics in Database Systems: C2

XPath Conclusion• We have learned:

– XPath definition– Path expressions– Standard functions– Terminology– Predicates– Location paths– Axes– Some operators

Page 25: CS 253: Topics in Database Systems: C2

• Before we go on, one more thing about XML:

• XML Namespaces

Page 26: CS 253: Topics in Database Systems: C2

Naming ambiguity

Page 27: CS 253: Topics in Database Systems: C2

The Idea to Solve it

• Assign a URI (~ URL) to every sub-language:– E.g., for XHTML 1.0:

http://www.w3.org/1999/xhtml

• Qualify element names with URIs:– {http://www.w3.org/1999/xhtml}head

Page 28: CS 253: Topics in Database Systems: C2

The actual solution

• Namespace declarations bind URIs to prefixes:

• Default namespace (no prefix) declared with: xmlns=“…”

• Lexical Scope

• Attribute names can also be prefixed

Page 29: CS 253: Topics in Database Systems: C2

Applying namespaces

Page 30: CS 253: Topics in Database Systems: C2

• Next we look at how to query XML

• This can be done, to some extent, as we have seen, within XSLT,

• but the main language developed for this purpose is …

Page 31: CS 253: Topics in Database Systems: C2

XQuery

Page 32: CS 253: Topics in Database Systems: C2

What is XQuery?• XQuery is the language for querying XML data • XQuery for XML is like SQL for databases • XQuery is built on XPath expressions • XQuery is defined by the W3C • XQuery is supported by all the major database

engines (IBM, Oracle, Microsoft, etc.) • XQuery is a W3C recommendation (Jan 2007)

thus a standard

Page 33: CS 253: Topics in Database Systems: C2

XQuery and XPath

• XQuery 1.0 and XPath 2.0 share the same data model and support the same functions and operators.

Page 34: CS 253: Topics in Database Systems: C2

XQuery - Examples of Use

• Extract information to use in a Web Service

• Generate summary reports

• Transform XML data to XHTML

• Search Web documents for relevant information

Page 35: CS 253: Topics in Database Systems: C2

Usage Scenario: Document-Oriented

• Queries could be used– To retrieve parts of documents– To provide dynamic indexes– To perform context-sensitive searching– To generate new documents as

combinations of existing ones

Page 36: CS 253: Topics in Database Systems: C2

Usage Scenario: Programming

• Queries could be used to automatically generate documentation

Page 37: CS 253: Topics in Database Systems: C2

Usage Scenario: Hybrid

• Queries could be used to data mine hybrid data, such as patient records

Page 38: CS 253: Topics in Database Systems: C2

XQuery compared to XPath

• XQuery 1.0 is a strict superset of XPath 2.0 XPath 2.0 expression is directly an XQuery

1.0 expression (a query)

• The extra expressive power is the ability to:– Join information from different sources and– Generate new XML fragments

Page 39: CS 253: Topics in Database Systems: C2

Relationship to XSLT

• XQuery, XSLT: both domain-specific languages for combining and transforming data from multiple sources

• different in design - historical reasons– XQuery: designed from scratch– XSLT: intellectual descendant of CSS

• technically, they may emulate each other

Page 40: CS 253: Topics in Database Systems: C2

XQuery query

• Prolog– Like XPath, XQuery expressions are

evaluated relatively to a context– explicitly provided by a prolog (header)

~ header with definitions

• Body– The actual query

Page 41: CS 253: Topics in Database Systems: C2

XQuery Prolog (i.e., header(s))

• Settings define various parameters for the XQuery processor language, such as:

xquery version “1.0”;module "http://www.w3.org/2003/05/xpath-functions"default element namespace=

"http://www.w3.org/1999/xhtml"declare namespace xs= "http://www.w3.org/2001/XMLSchema"import module "http://www.w3.org/2003/05/xpath-

functions" at "logo.xq"define function addLogo($root as node()) as node()* { }(: etc :)

Page 42: CS 253: Topics in Database Systems: C2

XQuery capabilities

• Generate

• Join

• Select

Page 43: CS 253: Topics in Database Systems: C2

Generate: constructors• XQuery expressions may compute new XML nodes

• Expressions may denote:– element, character data, comment and processing

instruction nodes node is created with a unique node identity

• Constructors may be either – direct or – computed

Page 44: CS 253: Topics in Database Systems: C2

Direct constructors in XQuery

<XMLfragment>my fragment </XMLfragment>

• Evaluates to the given XML fragment

Page 45: CS 253: Topics in Database Systems: C2

Explicit, computed constructors

Page 46: CS 253: Topics in Database Systems: C2

Variable bindings

<employee empid="{$id}"> <name>{$name}</name>

{$job} <deptno>{$deptno}</deptno> <salary>{$SGMLspecialist+100000}</salary>

</employee>

Page 47: CS 253: Topics in Database Systems: C2

How to Select Nodes with XQuery?

• Functions– XQuery uses functions to extract data from

XML documents.

• (X)Path Expressions– XQuery uses path expressions to navigate

through elements in an XML document.

• Predicates– XQuery uses predicates to limit the extracted

data from XML documents.

Page 48: CS 253: Topics in Database Systems: C2

Functions

• doc() – function to open a file

• Example:– doc("books.xml")

• Note: A call to a function can appear where an expression may appear.

Page 49: CS 253: Topics in Database Systems: C2

Path Expressions

• Example:select all the title elements in the "books.xml"

file:

doc("books.xml")/bookstore/book/title

Page 50: CS 253: Topics in Database Systems: C2

Predicates

• Example:select all the book elements under the

bookstore element that have a price element with a value that is less than 30 :

doc("books.xml")/bookstore/book[price<30]

Page 51: CS 253: Topics in Database Systems: C2

At a glance: function, path, predicate

Page 52: CS 253: Topics in Database Systems: C2

FLWOR

• For, Let, Where, Order by, Return

= main engine

~ SQL syntax (SFHW)

~ programs and function calls

Page 53: CS 253: Topics in Database Systems: C2

FLWOR by comparison with Path expressions

• select all the title elements under the book elements that are under the bookstore element that have a price element with

a value that is higher than 30.

• Path expression:doc("books.xml")/bookstore/book[price>30]/title

• FLWOR expression: for $x in doc("books.xml")/bookstore/book where $x/price>30 return $x/title

Page 54: CS 253: Topics in Database Systems: C2

Sorting in FLWOR• for $x in doc("books.xml")/bookstore/book

where $x/price>30

order by $x/title

return $x/title

Page 55: CS 253: Topics in Database Systems: C2

Present the Result In an HTML List

<ul>

{

for $x in doc("books.xml")/bookstore/book/title

order by $x

return <li>{$x}</li>

}

</ul>

Page 56: CS 253: Topics in Database Systems: C2

Result HTML List

<ul> <li><title lang="en">Everyday

Italian</title></li> <li><title lang="en">Harry Potter</title></li> <li><title lang="en">Learning XML</title></li> <li><title lang="en">XQuery Kick

Start</title></li> </ul>

Page 57: CS 253: Topics in Database Systems: C2

Eliminate element (here: title)

<ul>

{

for $x in doc("books.xml")/bookstore/book/title

order by $x

return <li>data($x)</li> (: also text() :)

}

</ul>

Page 58: CS 253: Topics in Database Systems: C2

New result HTML List

<ul>

<li>Everyday Italian</li>

<li>Harry Potter</li>

<li>Learning XML</li>

<li>XQuery Kick Start</li>

</ul>

Page 59: CS 253: Topics in Database Systems: C2

Another FLWOR Expression

Page 60: CS 253: Topics in Database Systems: C2

The Difference between for and let

Page 61: CS 253: Topics in Database Systems: C2

The Difference between for and let

Page 62: CS 253: Topics in Database Systems: C2

The Difference between for and let

Page 63: CS 253: Topics in Database Systems: C2

The Difference between for and let

Page 64: CS 253: Topics in Database Systems: C2

FLWOR Basic Building Blocks

Page 65: CS 253: Topics in Database Systems: C2

General rules

• for and let may be used many times in any order

• only one where is allowed

• many different sorting criteria can be specified (descending, ascending, etc.)

Page 66: CS 253: Topics in Database Systems: C2

Joining documentsfor $p IN doc("www.irs.gov/taxpayers.xml")//person

for $n IN doc("neighbors.xml")//neighbor[ssn = $p/ssn]

return

<person>

<ssn> { $p/ssn } </ssn>

{ $n/name }

<income> { $p/income } </income>

</person>

Page 67: CS 253: Topics in Database Systems: C2
Page 68: CS 253: Topics in Database Systems: C2
Page 69: CS 253: Topics in Database Systems: C2
Page 70: CS 253: Topics in Database Systems: C2
Page 71: CS 253: Topics in Database Systems: C2

ConditionalsFOR $b IN doc(“bib.xml”)/book

RETURN <short>   {$b/title}   <author>    {IF count($b/author) < 3      $b/author     ELSE      $b/author[1], <author>and others</author>    }   </author> </short>

Page 72: CS 253: Topics in Database Systems: C2

Functions• DEFINE FUNCTION depth($e) RETURNS

xsd:integer{ IF (empty($e/*) THEN 1 ELSE max(FOR $c in $e/* RETURN depth($c)) ) +1}

• FOR $b in doc(“bib.xml”)/bookRETURN depth($b)

Page 73: CS 253: Topics in Database Systems: C2

Existential and Universal Quantifiers

• FOR $b in doc(“bib.xml”)/bookWHERE SOME $author IN $b/author   SATISFIES $author/text() = “Ullman”RETURN $b

• FOR $b in doc(“bib.xml”)/bookWHERE EVERY $author IN $b/author           SATISFIES $author/text() = “Ullman”RETURN $b

Return books where all authors are “Ullman”

Return books where at least one author is “Ullman”

Page 74: CS 253: Topics in Database Systems: C2

XQuery on Distributed Sources

Page 75: CS 253: Topics in Database Systems: C2
Page 76: CS 253: Topics in Database Systems: C2

XQuery Syntax• Declarative, functional language

~ SQL

• Nested expressions• Case sensitive• White spaces:

– Tabs, space, CR, LF– Ignored between language constructs– Significant in quoted strings

• No special EOL character

Page 77: CS 253: Topics in Database Systems: C2

Keywords and names• Keywords and operators

– Case-sensitive, generally lower case– May have several meanings depending on the

context• E.g. “*” or “in”

– No reserved words

• All names must be valid XML names – For variables, functions, elements, attributes– Can be associated with a namespace

Page 78: CS 253: Topics in Database Systems: C2

Comments

Page 79: CS 253: Topics in Database Systems: C2

Comparisons• Value comparisons

Eq, ne, lt, le, gt, ge

Used to compare individual values

Each operand must be a single atomic value (or a node containing a single atomic value)

• General comparisons=, !=, <, <=, >, >=

Can be used with sequences of multiple items

Page 80: CS 253: Topics in Database Systems: C2

Example

Page 81: CS 253: Topics in Database Systems: C2

Query Prolog

Page 82: CS 253: Topics in Database Systems: C2

XQuery gives you a choice:

• Path Expressions:– If you just want to copy certain elements

and attributes as is

• FLWOR Expressions:– Allow sorting– Allow adding elements/attributes– Verbose, but can be clearer

Page 83: CS 253: Topics in Database Systems: C2

XQuery tools

• XStylus Studio 2007 http://www.stylusstudio.com/xml_download.html (free trial version)– See also short its XQuery intro at:

http://www.stylusstudio.com/xquery_primer.html

Page 84: CS 253: Topics in Database Systems: C2

XML and programming

• XSLT, XPath and XQuery provide tools for specialized tasks.

• But many applications are not covered: – domain-specific tools for concrete XML

languages – general tools that nobody has thought of yet

Page 85: CS 253: Topics in Database Systems: C2

XML in general-purpose programming languages

• parse XML documents into XML trees

• navigate through XML trees

• construct XML trees

• output XML trees as XML documents

• DOM and SAX are corresponding APIs that are language independent and supported by numerous languages. JDOM is an API that is tailored to Java.

Page 86: CS 253: Topics in Database Systems: C2

XQuery Conclusion• We have learned:

– XQuery definition– Usage scenarios– Comparison w. XSLT and XPath– Capabilities– Functions, path expressions and predicates– FLWOR– Extensions for generic programming with XML