Upload
albert-lesley-miles
View
229
Download
0
Tags:
Embed Size (px)
Citation preview
Unit 04 : W3C and Xpath
COMP 5323Web Database Technologies and
Applications 2014
• This PowerPoint is prepared for educational purpose and is strictly used in the classroom lecturing.
• We have adopted the "Fair Use" doctrine in this PowerPoint which allows limited copying of copyrighted works for educational and research purposes.
Doctrine of Fair Use
Learning Objectives
• Learn more about W3C• Understand the XML query language
CypherRDB
XQL XQuery XML
SQL RDB
Neo4j
Outline
1. W3C2. XPath3. XQuery
1 W3C
Overview
• The W3C – Who they are, their core beliefs, their long term goals, their members
• The W3C – Who they influence, their business processes and recommendations
• The relationship between W3C and open standards
World Wide Web Consortium (W3C)
• The World Wide Web Consortium (W3C) is an international consortium where members and staff work together to develop many different web standards.
• Their mission is to lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web.
WC3: Founder• Tim Berners-Lee, W3C Director
and inventor of the World Wide Web in 1984.
• Served as W3C Director since 1994 when the organization was founded.
• “W3C members work together to design web technologies that build upon its versatility, giving the world the power to enhance communication and commerce for anyone, anywhere, anytime, and using any device.”
Tim Berners-Lee
Semantic Web• The Semantic Web is a collaborative movement led
by international standards body the World Wide Web Consortium (W3C).
• The standard promotes common data formats on the World Wide Web.
• By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web, dominated by unstructured and semi-structured documents into a "web of data".
• The Semantic Web stack builds on the W3C's Resource Description Framework (RDF).
http://en.wikipedia.org/wiki/Semantic_Web
W3C: Core Beliefs• “W3C believes that in
order for the Web to reach its full potential, the most fundamental Web technologies must be compatible with one another and allow any hardware and software users to access the Web to work together.” (www.w3.org)
• One of their main goals is to have “Web Interoperability”
W3C: Long Term Goals for the Web• Web for Everyone
– Make the Web available regardless of hardware, software, language, culture, etc.
• Web on Everything– Make Web access from any
kind of device as simple and convenient as possible
• Knowledge Base– Enable people to solve
problems that would be otherwise too complex or tedious to solve
• Trust and Confidence– Make accountability, security,
confidence and confidentiality possible for all users
W3C: Members• W3C is comprised of more than
400 members including the world’s foremost technology companies such as Hewlett Packard, IBM, Nokia, Microsoft, AT&T, Intel, Oracle, and Xerox.
• W3C allows its members to lead the web to its full potential by allowing them to take leadership roles, promote their image as innovators, and gain early insight to market trends.
W3C’s Influence• Most influential of all organizations in the
development and maintenance of the World Wide Web
• W3C has no legal authority to enforce its recommendations because membership is voluntary
• Members often follow recommendations because it helps set standards for the Web which in turn benefits each member
W3C’s Influence• The W3C has made over 90
recommendations since its start in 1994.
• W3C operations are administered by offices in Japan, France, and the United States.
• As more corporations join, W3C’s recommendations will become the standard for the WWW and thus make it easier for both corporations and the public.
Recommendations• W3C published the Speech Synthesis Markup Language
(SSML)1.0 – Works on synthesized speech in Web interactions. – For example, how would you pronounce “1/2”?
• It could be February 2nd, one half, or 1 divided by 2.
• XML-binary Optimized Packaging (XOP) – “XML-binary Optimized Packaging (XOP) provides a standard
method for applications to include binary data, as is, along with an XML document in a package. As a result, applications need less space to store the data and less bandwidth to transmit it.” (Business Wire)
Business Processes• W3C’s work attempts to standardize the Web.• Each member contributes to the process with
decisions being made through community consensus.
• Each member has the same decision power no matter what size they are.
• If a general consensus can’t be reached, decisions are made on a majority basis.
W3C and Open Standards
• “W3C seeks to avoid market fragmentation and thus Web fragmentation by publishing open standards for Web languages and protocols.” (www.w3.org)
• To achieve the goal of one Web, specifications for the Web's formats and protocols must be compatible with one another and allow any hardware and software used to access the Web to work together – thus W3C designs and promotes interoperable open formats and protocols to avoid market fragmentation.
Open Standards Guidelines• Transparency
– A public process with public access to all information• Relevance
– Start based on due analysis and market needs for all• Openness
– Anybody can participate: users and developers; industry and research; governments and public
• Impartial and consensus based – Guaranteed fairness and equal weight for each participant
• Availability – Free access to standard documents
• Maintenance – Testing, Revisions
2 XPATH
XPath• It provides a way to refer to specific parts of an XML tree• An ‘URL- like’ scheme for locating documents on local and
remote computer systems.• Primary purpose: Address ‘parts’ of an XML document,
and provide basic facilities for manipulation of strings, numbers and booleans.
• Used by other XML technologies• XSLT• Xquery Language
• http://www.w3.org/TR/xpath
Why XPath
• Does an XML tree look like the directory tree of the computer's file system?
Why XPath
• Unique identifiers are not sufficient– Assigning unique identifier to every element is
a burden– Identity of element may be unknown – Identifiers cannot handle ranges of text– May be inconvenient to identify a large
number of objects by listing their identifiers
Data Model• Treats an XML document as a logical tree• This tree consists of 7 nodes:
Root Node – the root of the documentElement Nodes – one for each element in the document
Unique ID’sAttribute NodesNamespace NodesProcessing Instruction Nodes (intended to carry
instructions to the application)Comment NodesText Nodes
• The tree structure is ordered and reads from top to bottom and left to right
Data Model
Data Model Example 1For this simple doc:
<doc><?encoding="UTF-8"?><para>Some <em>emphasis</em> here. </para><para>Some more stuff.</para></doc>
Might be represented as:root
<doc>
<?pi?> <para> <para>
text <em> text text
text
Data Model
Example 2(a)<?xml version="1.0" encoding="UTF-8" ?><bib>
<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
<?xml version="1.0" encoding="UTF-8" ?><bib>
<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
Example 2(b)
bib
book book
publisher author . . . .
Addison-Wesley Serge Abiteboul
The root
The root elementProcessing instruction
Comment
Nodes and Atomic values
• Nodes– <bib> (root element node) [note: not a root node]– <author> Victor Vianu </author> (element node)– price=“55” (attribute node)
• Atomic Values– Victor Vianu– “55”
Relationship of Nodes
• the book element is the parent of the title, author and year.
• the title, author, year elements are all children of the book element
• the title, author, year elements are all siblings.• the ancestors of the title element are the book
element and the bib element• descendants of the bib element are the book,
title, author.
Element Context
• Meaning of element can depend upon its context– <book><title>…</title></book>
<person><title>…</title></person>
• Want to search for, e.g. title of book, not title of person– XPath exploits sequential and hierarchical
context of XML to specify elements by their context (i.e. location in hierarchy)
• book/title person/title
Relative Path
• A relative location path consists of a sequence of one or more location steps separated by /
• Each node in that set is used as a context node for the following step • E.g. para will select children of the
current node that are of name 'para‘
• <chapter> //Current node <title>…</title> <para>…</para> //Selected <note> <para>…</para> //Not selected until note <note></chapter>
• Verbose expression is child::para
Absolute Path
• For some cases, a relative path is not suitable.– E.g. it may be necessary to select the title of a book, regardless of
the current context. In which the location relative to the document (as a whole) may be know, whereas the offset from the current location may not - use absolute path.
• An absolute path is similar to relative path, except for the front part start with “/” - root of document– e. g. / book/ title
• Use “//” expression, it can even possible to select all occurrences of a specific element type.– e. g. // author
Partial Tree of faculty.xml
student
student
d101 faculty.xml
e101faculty
a101
e102 e110 e118
name
Sciencecourse course course
a102
t101
e103 e104 e107
e105a103
p13
pid
year
2007 DB Sys
name
e106
t102
sid
s1
grade
t103
A+
e108 e109
t104
sid
s2
grade
t105
B
……….
doc("faculty.xml")/descendant::course[attribute::pid="p13"]/child::student[child::sid="s2"]
What Does an XPath Expression Return?
• A sequence of result nodes with their contents in the form of an (not necessarily well formed) XML document
• The doc( ) function is used to open the “faculty.xml" file
• XPath:doc("faculty.xml")/descendant::course[attribute::pid="p13"]/child::student[child::sid="s2"]
• Result:<student> <sid>s2</sid> <grade>B</grade></student><student><sid>s2</sid><grade>D</grade>
</student>
Example for XPath Queries<bib>
<book> <publisher> Addison-Wesley </publisher> <author age="30"> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
<bib><book> <publisher> Addison-Wesley </publisher> <author age="30"> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
Xpath: Simple Expressions
/bib/book/yearResult: <year> 1995 </year> <year> 1998 </year>
/bib/paper/yearResult: empty (there were no papers)
Xpath: //
//authorResult: <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author>
/bib//first-nameResult: <first-name> Rick </first-name>
Xpath: Functions/bib/book/author/text()Result: Serge Abiteboul
Victor Vianu Jeffrey D. Ullman
Note: Rick Hull doesn’t appear because he has firstname, lastname
Functions in XPath:– text() = matches the text value– node() = matches any node { = * }– name() = returns the name of the current tag
• http://www.w3.org/TR/xquery-operators/
Xpath: Wildcard
//author/*
Result: <first-name> Rick </first-name> <last-name> Hull </last-name>
Note: * matches any element
Xpath: Attribute Nodes
/bib/book/@price
Result: “55”
@price means that price is has to be an attribute
Xpath: Qualifiers
/bib/book/author[first-name]/bib/book/author[last-name]Result: <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author>
/bib/book/author[1]Result: <author>Serge Abiteboul</author> <author>Jeffrey D. Ullman</author>
Xpath: More Qualifiers
/bib/book/author[first-name]/last-name
Result: <lastname> Hull </lastname>
Xpath: More Qualifiers
/bib/book[@price < “60”]Result : <book price="55">…..</book>
/bib/book[author/@age < “35”]Result: <book> <publisher>Addison-Wesle……
/bib/book[author/text()]
Selecting Several Paths
/bib/book/author | /bib/book/title
Result<author age="30">Serge Abiteboul</author><author> <first-name>Rick</first-name> <last-name>Hull</lastname> </author><author>Victor Vianu</author><title>Foundations of Databases</title><author>Jeffrey D. Ullman</author><title>Principles of Database and Knowledge Base Systems</title>
Axes
• Axis defines a node-set relative to the current node.– Indicates which nodes are included in search
• Relative to context node
– Dictates node ordering in set• Forward axes select nodes that follow context
node• Reverse axes select nodes that precede context
node
XPath axes
Node Tests
• Node tests– define a set of nodes selected by axis
• Rely upon axis’ principle node type– Corresponds to type of node axis can select
Location
• The syntax for a location step is:
axisname::nodetest[predicate]
• Reference http://www.w3schools.com/xpath/xpath_axes.asp
Example
descendant::first-name
Result <first-name>Rick</first-name>
books.xml<?xml version = "1.0"?><books> <book> <title>Java How to Program</title> <translation edition = "1">Spanish</translation> <translation edition = "1">Chinese</translation> <translation edition = "1">Japanese</translation> <translation edition = "2">French</translation> <translation edition = "2">Japanese</translation> </book> <book> <title>C++ How to Program</title> <translation edition = "1">Korean</translation> <translation edition = "2">French</translation> <translation edition = "2">Spanish</translation> </book></books>
51
Location Paths Using Axes and Node Tests
• Which books have Japanese translations?– Use root node of XPath tree as context node– Use predicate
• Boolean expression for filtering nodes from search• Compare string value of current node to string
‘Japanese’/books/book/translation[. =
‘Japanese’]/../title
• Result <title>Java How to Program</title>
More Examples• /books/book/translation[.='Japanese'] Result <translation edition="1">Japanese</translation>
<translation edition="2">Japanese</translation>
• /books/book/translation[.='Japanese']/.. Result
<book> <title>Java How to Program</title> <translation edition="1">Spanish</translation> <translation edition="1">Chinese</translation> <translation edition="1">Japanese</translation> <translation edition="2">French</translation> <translation edition="2">Japanese</translation></book>
Node-set Operators and Functions
• Node-set operators– Manipulate node sets to form others
• Node-set functions– Perform actions on node-sets returned by location
paths
Some node-set functions
Example
• /books/book/count(translation)• Result 5 3
• /books/book/translation/position()• Result 1 2 3 4 5 6 7 8
Node-set Operators and Functions • Location-path expressions
– Combine node-set operators and functions• Select all head and body children element nodeshead | body
• Select last bold element node in head element nodehead/title[ last() ]
• Select third book elementbook[ position() = 3 ]
– Or alternativelybook[ 3 ]
• Return total number of element-node childrencount( * )
• Select all book element nodes in document//book
XPath 2.0• Latest version:
– http://www.w3.org/TR/xpath20/• W3C Working Draft 22 August 2003 • XPath 2.0 is a much more powerful language that operates
on a much larger domain of data types• A better way of describing XPath 2.0 is as an expression
language for processing sequences, with built-in support for querying XML documents
• XPath 2.0 is a strict syntactic subset of XQuery 1.0. Any expression that is syntactically valid and executes successfully in both XPath 2.0 and XQuery 1.0 will return the same result in both languages
XPath 2.0• XPath 2.0 introduces support for the XML Schema primitive types,
which immediately gives the user access to 19 simple types, including dates, years, months, URIs, etc.
• In addition, a number of functions and operators are provided for processing and constructing these different data types
• Everything is a sequence and sequences are ordered• In XPath 1.0, if you wanted to process a collection of nodes, you had
to deal with node-sets.• In XPath 2.0, the concept of the node-set has been generalized and
extended.• Sequences may contain simple-typed values as well as nodes • “for” expression enables iteration over sequences
XPath 2.0• For loop
– sum(for $x in /order/item return $x/price * $x/quantity)• Conditional expression:
– if ($widget1/unit-cost < $widget2/unit-cost) – then $widget1– else $widget2
• Quantifiers:– some $x in /students/student/name satisfies $x = "Fred“– every $x in /students/student/name satisfies $x = "Fred"
XPath 2.0
• Intersections, differences, unions– The except operator to select all of a given node-
set, except for certain nodes • @* except @exc:foo
– the intersect operator • $x intersect /foo/bar
Xpath Conclusion
• XPath provides a concise and intuitive way to address into XML documents
• Standard part of the XSLT and XPointer specifications• Implementing XPath basically requires learning the
abbreviated syntax of location path expressions and the functions of the core library
Reference
Online Example•http://www.w3schools.com/xpath/xpath_examples.asp
•www.w3.org
•Priscilla Walmsley, XQuery: Search Across a Variety of XML Data, O Reilly Media, 2007