Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Unit 04 : W3C and Xpath

COMP 5323Web Database Technologies and

Applications 2014

• This PowerPoint is prepared for educational purpose and is strictly used in the classroom lecturing.

• We have adopted the "Fair Use" doctrine in this PowerPoint which allows limited copying of copyrighted works for educational and research purposes.

Doctrine of Fair Use

http://ikonexpress-uwm.com/images/copyright_symbol.bmp

Learning Objectives

• Learn more about W3C• Understand the XML query language

CypherRDB

XQL XQuery XML

SQL RDB

Neo4j

Outline

1. W3C2. XPath3. XQuery

1 W3C

Overview

• The W3C – Who they are, their core beliefs, their long term goals, their members

• The W3C – Who they influence, their business processes and recommendations

• The relationship between W3C and open standards

World Wide Web Consortium (W3C)

• The World Wide Web Consortium (W3C) is an international consortium where members and staff work together to develop many different web standards.

• Their mission is to lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web.

WC3: Founder• Tim Berners-Lee, W3C Director

and inventor of the World Wide Web in 1984.

• Served as W3C Director since 1994 when the organization was founded.

• “W3C members work together to design web technologies that build upon its versatility, giving the world the power to enhance communication and commerce for anyone, anywhere, anytime, and using any device.”

Tim Berners-Lee

Semantic Web• The Semantic Web is a collaborative movement led

by international standards body the World Wide Web Consortium (W3C).

• The standard promotes common data formats on the World Wide Web.

• By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web, dominated by unstructured and semi-structured documents into a "web of data".

• The Semantic Web stack builds on the W3C's Resource Description Framework (RDF).

http://en.wikipedia.org/wiki/Semantic_Web

W3C: Core Beliefs• “W3C believes that in

order for the Web to reach its full potential, the most fundamental Web technologies must be compatible with one another and allow any hardware and software users to access the Web to work together.” (www.w3.org)

• One of their main goals is to have “Web Interoperability”

W3C: Long Term Goals for the Web• Web for Everyone

– Make the Web available regardless of hardware, software, language, culture, etc.

• Web on Everything– Make Web access from any

kind of device as simple and convenient as possible

• Knowledge Base– Enable people to solve

problems that would be otherwise too complex or tedious to solve

• Trust and Confidence– Make accountability, security,

confidence and confidentiality possible for all users

W3C: Members• W3C is comprised of more than

400 members including the world’s foremost technology companies such as Hewlett Packard, IBM, Nokia, Microsoft, AT&T, Intel, Oracle, and Xerox.

• W3C allows its members to lead the web to its full potential by allowing them to take leadership roles, promote their image as innovators, and gain early insight to market trends.

W3C’s Influence• Most influential of all organizations in the

development and maintenance of the World Wide Web

• W3C has no legal authority to enforce its recommendations because membership is voluntary

• Members often follow recommendations because it helps set standards for the Web which in turn benefits each member

W3C’s Influence• The W3C has made over 90

recommendations since its start in 1994.

• W3C operations are administered by offices in Japan, France, and the United States.

• As more corporations join, W3C’s recommendations will become the standard for the WWW and thus make it easier for both corporations and the public.

Recommendations• W3C published the Speech Synthesis Markup Language

(SSML)1.0 – Works on synthesized speech in Web interactions. – For example, how would you pronounce “1/2”?

• It could be February 2nd, one half, or 1 divided by 2.

• XML-binary Optimized Packaging (XOP) – “XML-binary Optimized Packaging (XOP) provides a standard

method for applications to include binary data, as is, along with an XML document in a package. As a result, applications need less space to store the data and less bandwidth to transmit it.” (Business Wire)

Business Processes• W3C’s work attempts to standardize the Web.• Each member contributes to the process with

decisions being made through community consensus.

• Each member has the same decision power no matter what size they are.

• If a general consensus can’t be reached, decisions are made on a majority basis.

W3C and Open Standards

• “W3C seeks to avoid market fragmentation and thus Web fragmentation by publishing open standards for Web languages and protocols.” (www.w3.org)

• To achieve the goal of one Web, specifications for the Web's formats and protocols must be compatible with one another and allow any hardware and software used to access the Web to work together – thus W3C designs and promotes interoperable open formats and protocols to avoid market fragmentation.

Open Standards Guidelines• Transparency

– A public process with public access to all information• Relevance

– Start based on due analysis and market needs for all• Openness

– Anybody can participate: users and developers; industry and research; governments and public

• Impartial and consensus based – Guaranteed fairness and equal weight for each participant

• Availability – Free access to standard documents

• Maintenance – Testing, Revisions

2 XPATH

XPath• It provides a way to refer to specific parts of an XML tree• An ‘URL- like’ scheme for locating documents on local and

remote computer systems.• Primary purpose: Address ‘parts’ of an XML document,

and provide basic facilities for manipulation of strings, numbers and booleans.

• Used by other XML technologies• XSLT• Xquery Language

• http://www.w3.org/TR/xpath

Why XPath

• Does an XML tree look like the directory tree of the computer's file system?

Why XPath

• Unique identifiers are not sufficient– Assigning unique identifier to every element is

a burden– Identity of element may be unknown – Identifiers cannot handle ranges of text– May be inconvenient to identify a large

number of objects by listing their identifiers

Data Model• Treats an XML document as a logical tree• This tree consists of 7 nodes:

Root Node – the root of the documentElement Nodes – one for each element in the document

Unique ID’sAttribute NodesNamespace NodesProcessing Instruction Nodes (intended to carry

instructions to the application)Comment NodesText Nodes

• The tree structure is ordered and reads from top to bottom and left to right

Data Model

Data Model Example 1For this simple doc:

<doc><?encoding="UTF-8"?><para>Some <em>emphasis</em> here. </para><para>Some more stuff.</para></doc>

Might be represented as:root

<doc>

<?pi?> <para> <para>

text <em> text text

text

Data Model

Example 2(a)<?xml version="1.0" encoding="UTF-8" ?><bib>

<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<?xml version="1.0" encoding="UTF-8" ?><bib>

<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

Example 2(b)

bib

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul

The root

The root elementProcessing instruction

Comment

Nodes and Atomic values

• Nodes– <bib> (root element node) [note: not a root node]– <author> Victor Vianu </author> (element node)– price=“55” (attribute node)

• Atomic Values– Victor Vianu– “55”

Relationship of Nodes

• the book element is the parent of the title, author and year.

• the title, author, year elements are all children of the book element

• the title, author, year elements are all siblings.• the ancestors of the title element are the book

element and the bib element• descendants of the bib element are the book,

title, author.

Element Context

• Meaning of element can depend upon its context– <book><title>…</title></book>

<person><title>…</title></person>

• Want to search for, e.g. title of book, not title of person– XPath exploits sequential and hierarchical

context of XML to specify elements by their context (i.e. location in hierarchy)

• book/title person/title

Relative Path

• A relative location path consists of a sequence of one or more location steps separated by /

• Each node in that set is used as a context node for the following step • E.g. para will select children of the

current node that are of name 'para‘

• <chapter> //Current node <title>…</title> <para>…</para> //Selected <note> <para>…</para> //Not selected until note <note></chapter>

• Verbose expression is child::para

Absolute Path

• For some cases, a relative path is not suitable.– E.g. it may be necessary to select the title of a book, regardless of

the current context. In which the location relative to the document (as a whole) may be know, whereas the offset from the current location may not - use absolute path.

• An absolute path is similar to relative path, except for the front part start with “/” - root of document– e. g. / book/ title

• Use “//” expression, it can even possible to select all occurrences of a specific element type.– e. g. // author

Partial Tree of faculty.xml

student

student

d101 faculty.xml

e101faculty

a101

e102 e110 e118

name

Sciencecourse course course

a102

t101

e103 e104 e107

e105a103

p13

pid

year

2007 DB Sys

name

e106

t102

sid

s1

grade

t103

A+

e108 e109

t104

sid

s2

grade

t105

B

……….

doc("faculty.xml")/descendant::course[attribute::pid="p13"]/child::student[child::sid="s2"]

What Does an XPath Expression Return?

• A sequence of result nodes with their contents in the form of an (not necessarily well formed) XML document

• The doc( ) function is used to open the “faculty.xml" file

• XPath:doc("faculty.xml")/descendant::course[attribute::pid="p13"]/child::student[child::sid="s2"]

• Result:<student> <sid>s2</sid> <grade>B</grade></student><student><sid>s2</sid><grade>D</grade>

</student>

Example for XPath Queries<bib>

<book> <publisher> Addison-Wesley </publisher> <author age="30"> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author age="30"> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

Xpath: Simple Expressions

/bib/book/yearResult: <year> 1995 </year> <year> 1998 </year>

/bib/paper/yearResult: empty (there were no papers)

Xpath: //

//authorResult: <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author>

/bib//first-nameResult: <first-name> Rick </first-name>

Xpath: Functions/bib/book/author/text()Result: Serge Abiteboul

Victor Vianu Jeffrey D. Ullman

Note: Rick Hull doesn’t appear because he has firstname, lastname

Functions in XPath:– text() = matches the text value– node() = matches any node { = * }– name() = returns the name of the current tag

• http://www.w3.org/TR/xquery-operators/

Xpath: Wildcard

//author/*

Result: <first-name> Rick </first-name> <last-name> Hull </last-name>

Note: * matches any element

Xpath: Attribute Nodes

/bib/book/@price

Result: “55”

@price means that price is has to be an attribute

Xpath: Qualifiers

/bib/book/author[first-name]/bib/book/author[last-name]Result: <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author>

/bib/book/author[1]Result: <author>Serge Abiteboul</author> <author>Jeffrey D. Ullman</author>

Xpath: More Qualifiers

/bib/book/author[first-name]/last-name

Result: <lastname> Hull </lastname>

Xpath: More Qualifiers

/bib/book[@price < “60”]Result : <book price="55">…..</book>

/bib/book[author/@age < “35”]Result: <book> <publisher>Addison-Wesle……

/bib/book[author/text()]

Selecting Several Paths

/bib/book/author | /bib/book/title

Result<author age="30">Serge Abiteboul</author><author> <first-name>Rick</first-name> <last-name>Hull</lastname> </author><author>Victor Vianu</author><title>Foundations of Databases</title><author>Jeffrey D. Ullman</author><title>Principles of Database and Knowledge Base Systems</title>

Axes

• Axis defines a node-set relative to the current node.– Indicates which nodes are included in search

• Relative to context node

– Dictates node ordering in set• Forward axes select nodes that follow context

node• Reverse axes select nodes that precede context

node

XPath axes

Node Tests

• Node tests– define a set of nodes selected by axis

• Rely upon axis’ principle node type– Corresponds to type of node axis can select

Location

• The syntax for a location step is:

axisname::nodetest[predicate]

• Reference http://www.w3schools.com/xpath/xpath_axes.asp

Example

descendant::first-name

Result <first-name>Rick</first-name>

books.xml<?xml version = "1.0"?><books> <book> <title>Java How to Program</title> <translation edition = "1">Spanish</translation> <translation edition = "1">Chinese</translation> <translation edition = "1">Japanese</translation> <translation edition = "2">French</translation> <translation edition = "2">Japanese</translation> </book> <book> <title>C++ How to Program</title> <translation edition = "1">Korean</translation> <translation edition = "2">French</translation> <translation edition = "2">Spanish</translation> </book></books>

51

Location Paths Using Axes and Node Tests

• Which books have Japanese translations?– Use root node of XPath tree as context node– Use predicate

• Boolean expression for filtering nodes from search• Compare string value of current node to string

‘Japanese’/books/book/translation[. =

‘Japanese’]/../title

• Result <title>Java How to Program</title>

More Examples• /books/book/translation[.='Japanese'] Result <translation edition="1">Japanese</translation>

<translation edition="2">Japanese</translation>

• /books/book/translation[.='Japanese']/.. Result

<book> <title>Java How to Program</title> <translation edition="1">Spanish</translation> <translation edition="1">Chinese</translation> <translation edition="1">Japanese</translation> <translation edition="2">French</translation> <translation edition="2">Japanese</translation></book>

Node-set Operators and Functions

• Node-set operators– Manipulate node sets to form others

• Node-set functions– Perform actions on node-sets returned by location

paths

Some node-set functions

Example

• /books/book/count(translation)• Result 5 3

• /books/book/translation/position()• Result 1 2 3 4 5 6 7 8

Node-set Operators and Functions • Location-path expressions

– Combine node-set operators and functions• Select all head and body children element nodeshead | body

• Select last bold element node in head element nodehead/title[ last() ]

• Select third book elementbook[ position() = 3 ]

– Or alternativelybook[ 3 ]

• Return total number of element-node childrencount( * )

• Select all book element nodes in document//book

XPath 2.0• Latest version:

– http://www.w3.org/TR/xpath20/• W3C Working Draft 22 August 2003 • XPath 2.0 is a much more powerful language that operates

on a much larger domain of data types• A better way of describing XPath 2.0 is as an expression

language for processing sequences, with built-in support for querying XML documents

• XPath 2.0 is a strict syntactic subset of XQuery 1.0. Any expression that is syntactically valid and executes successfully in both XPath 2.0 and XQuery 1.0 will return the same result in both languages

XPath 2.0• XPath 2.0 introduces support for the XML Schema primitive types,

which immediately gives the user access to 19 simple types, including dates, years, months, URIs, etc.

• In addition, a number of functions and operators are provided for processing and constructing these different data types

• Everything is a sequence and sequences are ordered• In XPath 1.0, if you wanted to process a collection of nodes, you had

to deal with node-sets.• In XPath 2.0, the concept of the node-set has been generalized and

extended.• Sequences may contain simple-typed values as well as nodes • “for” expression enables iteration over sequences

XPath 2.0• For loop

– sum(for $x in /order/item return $x/price * $x/quantity)• Conditional expression:

– if ($widget1/unit-cost < $widget2/unit-cost) – then $widget1– else $widget2

• Quantifiers:– some $x in /students/student/name satisfies $x = "Fred“– every $x in /students/student/name satisfies $x = "Fred"

XPath 2.0

• Intersections, differences, unions– The except operator to select all of a given node-

set, except for certain nodes • @* except @exc:foo

– the intersect operator • $x intersect /foo/bar

Xpath Conclusion

• XPath provides a concise and intuitive way to address into XML documents

• Standard part of the XSLT and XPointer specifications• Implementing XPath basically requires learning the

abbreviated syntax of location path expressions and the functions of the core library

Reference

Online Example•http://www.w3schools.com/xpath/xpath_examples.asp

•www.w3.org

•Priscilla Walmsley, XQuery: Search Across a Variety of XML Data, O Reilly Media, 2007

http://www.w3schools.com/xpath/xpath_examples.asp

http://www.w3schools.com/xpath/xpath_examples.asp

http://www.w3.org/

Documents

Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014