Text of The XML Standard Overview of our XML Standards Motivation: HTML vs XML XML 101: syntax, elements,...
Slide 1
Slide 2
The XML Standard
Slide 3
Overview of our XML Standards Motivation: HTML vs XML XML 101:
syntax, elements, attributes, DTDs, XML 201: XML Schema, Namespaces
XSLT: Transforming and Rendering XML XQuery: Search, Transform
& Integrate
Slide 4 simple, very flexible data exchange format:
semistructured data model => new applications: Information
exchange (B2B), sharing (diglib), integration ("mediation"),
archival,... Web site mangement (XML+XSL stylesheets),...">
So what is XML (all about)? Executive Summary: XML = HTML
idiosyncrasies (simplified syntax) + user-definable ("semantic")
tags Separation of data and its presentation => simple, very
flexible data exchange format: semistructured data model => new
applications: Information exchange (B2B), sharing (diglib),
integration ("mediation"), archival,... Web site mangement (XML+XSL
stylesheets),...
Slide 5
Whats Wrong with HTML? Y.Papakonstantinou, S. Abiteboul, H.
Garcia-Molina. "ObjectFusion in Mediator Systems". In VLDB 96. Y.
Papakonstantinou, S. Abiteboul, H. Garcia-Molina. Object Fusion in
Mediator Systems. In VLDB 96. HTML confuses presentation with
content
Slide 6
...Whats Wrong with HTML... Y.Papakonstantinou, S. Abiteboul,
H. Garcia-Molina. "ObjectFusion in Mediator Systems". In VLDB 96.
No Explicit Structure, Semantics, or Object-Orientation Author
Conference Title
Slide 7 HTML is inappropriate for data exchange automation of
information management (retrieval, manipulation,
integration)">
... And Some Repercussions Lack of schema/semantics when
querying the Web (HTML): "find documents (books, papers,...) where
author = Michael Jackson" (... and learn how software engineering
meets the moon walker...) "create a list of M. Jackson's books and
(if available) their prices" => HTML is inappropriate for data
exchange automation of information management (retrieval,
manipulation, integration)
Slide 8
XML is Based on Markup Y.Papakonstantinou S. Abiteboul H.
Garcia-Molina Object Fusion in Mediator Systems VLDB 96 Markup
indicates structure and semantics Decoupled from presentation
Slide 9
Elements and their Content element element name Character
content Element Content Empty Element Y.Papakonstantinou S.
Abiteboul H. Garcia-Molina Object Fusion in Mediator Systems VLDB
96
Slide 10
Element Attributes Y.Papakonstantinou S. Abiteboul H.
Garcia-Molina Object Fusion in Mediator Systems VLDB 96 Attribute
name Attribute Value
Slide 11
XML = Labeled Ordered Trees Yannis Serge... Object Fusion...
bibliography paper authors author... title fullpaper YannisSerge
Object Fusion... paper semistructured data labeled trees/graphs can
also represent relational and object-oriented data @id 23
Slide 12
How do I share structure and metadata/semantics with my
community? In Search of the Lost Structure & Semantics How to
make all this automatable? How do I learn and use the element
structure of a document?
Slide 13 improve query formulation, execution,... XML Schema
defines structure and data types XML Namespaces identify your
vocabulary Resource Description Framework (RDF) simple metadata
model">
Adding Structure and Semantics XML Document Type Definitions
(DTDs): define the structure of "allowed" documents (i.e., valid
wrt. a DTD) database schema => improve query formulation,
execution,... XML Schema defines structure and data types XML
Namespaces identify your vocabulary Resource Description Framework
(RDF) simple metadata model
Slide 14
XML DTDs as Extended CFGs bibliography paper* paper authors
fullPaper? title booktitle authors author+ lhs = element (name) rhs
= regular expression over elements + strings (PCDATA) XML DTD
Grammar
Slide 15
Document Type Definitions (DTDs) Define and Constrain Element
Names & Structure Element Type Declaration Attribute List
Declaration
Slide 16
Element Declarations Character content Authors followed by
optional fullpaper, followed by title, followed by booktitle
Sequence of 1 or more author Sequence of 0 or more paper
Slide 17
Element Content Declarations
Slide 18
Attributes Y.Papakonstantinou Object Fusion in Mediator Systems
Object Identity Attribute CDATA (character data) Yannis info IDREF
intradocument reference Reference to external ENTITY
Slide 19
Attribute Types
Slide 20
More on Attribute Declarations Attributes may be REQUIRED
IMPLIED (optional) can have default values default value may be
FIXED
Slide 21
Uses of XML Entities Physical partition size, reuse,
"modularity", (both XML docs & DTDs) Non-XML data unparsed
entities binary data Non-standard characters character entities
Shorthand for phrases & markup
Slide 22
Types of Entities Internal (to a doc) vs. External ( use URI)
General (in XML doc) vs. Parameter (in DTD) Parsed (XML) vs.
Unparsed (non-XML)
Slide 23
Internal Text Entities We all use the &WWW;. Internal Text
Entity Declaration Entity Reference We all use the World Wide Web.
Logically equivalent to actually appearing
Slide 24
Unparsed (& "Binary") Entities... and unparsed entity
Element with ENTITY attribute Declare attribute type to be entity
NOTATION declaration (helper app ) Declare external...
Slide 25
From Docs to Data: XML Schema XML DTDs (part of the XML spec.)
flexible, semistructured data model (nesting, ANY, ?, *, |,...) but
document-oriented (SGML heritage) XML Schema (W3C working draft)
schema definition language in XML data-oriented: data types extends
capabilities of DTD
Slide 26
Sample Data for Introduction to XML Schema Being a Dog Is a
Full-Time Job Charles M. Schulz Snoopy Peppermint Patty 1950-10-04
extroverted beagle Peppermint Patty 1966-08-22 bold, brash and
tomboyish
Slide 27
The Simple Russian Doll Approach to XML Schema Optional
Namespace Definition Sequence Compositor Simple Type Content for
title and author Complex Type Content for book Character may appear
any number of times Basic Type of XML Schema
Slide 28 Simple "> Simple Type Elements Attributes Complex
Type Element character Reference"> Simple " title="The Catalog
Approach to XML Schema: Stand-Alone Declarations & References
Simple ">
The Catalog Approach to XML Schema: Stand-Alone Declarations
& References Simple Type Elements Attributes Complex Type
Element character Reference
Slide 29 "> "> " title="Catalog Approach Contd ">
Catalog Approach Contd
Slide 30 nameType derived from xsd:string by having the
xsd:maxLength facet restrict string to a Maximum of to 32
characters nameType used in the declaration of
characterType">
Named Types Write stand- alone named complex type or simple
type declarations Primitive form of inheritance (called derivation)
allows Restriction Extension nameType derived from xsd:string by
having the xsd:maxLength facet restrict string to a Maximum of to
32 characters nameType used in the declaration of
characterType
Slide 31 "> "> " title="Groups: Named containers of sets
of Elements or Attributes ">
Groups: Named containers of sets of Elements or Attributes
Slide 32 So far we have seen sequences The group nameTypes
"> So far we have seen sequences The group nameTypes consists of
one of the element name the sequence containing firstName,
middlename, lastName"> So far we have seen sequences The group
nameTypes " title="Compositors: Sequence, Choice, All So far we
have seen sequences The group nameTypes ">
Compositors: Sequence, Choice, All So far we have seen
sequences The group nameTypes consists of one of the element name
the sequence containing firstName, middlename, lastName
Slide 33 The characterType consists of name, a list of"> The
characterType consists of name, a list of friend-of, since, and
qualification particles in no particular order. (Compare with the
sequence compositor.)"> The characterType consists of name, a
list of" title="Compositors (contd) The characterType consists of
name, a list of">
Compositors (contd) The characterType consists of name, a list
of friend-of, since, and qualification particles in no particular
order. (Compare with the sequence compositor.)
Slide 34
Derivation of Simple Types: Unions and Lists So far we have
seen restrictions and facets The simple type isbnType will be
either a 10-digit string (notice the pattern) the token "TBD or the
token "NA"
Slide 35
Constraints: Uniqueness By inserting xsd:unique in the book
element declaration we enforce that the character name s in each
book are unique
Slide 36 ">
Namespaces
Slide 37 ">
Including Unknown Elements
Slide 38
Presenting XML: XSLT Why Stylesheets? separation of content
(XML) from presentation (XSL) Why not just CSS for XML? XSL is far
more powerful: selecting elements transforming the XML tree content
based display (result may depend on data)
Slide 39
XSLT Overview XSLT stylesheets are denoted in XML syntax XSL
components: 1. a language for transforming XML documents (XSLT:
integral part of the XSL specification) 2. an XML formatting
vocabulary (Formatting Objects: >90% of the formatting
properties inherited from CSS)
Slide 40
XSLT Processing Model XML source tree XML,HTML, result tree XSL
stylesheet Transformatio n
Slide 41
XSLT Processing Model XSL stylesheet: collection of template
rules template rule: (pattern template) main steps: match pattern
against source tree instantiate template (replace current node . by
the template in the result tree) select further nodes for
processing control can be program-driven ("pull":...)
data/event-driven ("push":...)
Slide 42
Template Rule: Example (i) match pattern: process elements (ii)
instantiate template: replace each a product with two HTML tables
(iii) select the grandchildren ( sales/domestic , sales/foreign )
for further processing pattern template
Slide 43
Match/Select Patterns match patterns select patterns = defined
in http://w3.org/TR/xpath Examples: /mybook/chapter[2]/section/*
chapter|appendix chapter//para div[@class="appendix" and position()
mod 2 = 1]//para ../@lang
Slide 44
Creating the Result Tree... Literal result elements: non-XSL
elements (e.g., HTML) appear literally in the result tree
Constructing elements: (similar for xsl:attribute, xsl:text,
xsl:comment,) Generating text: attribute & children
definition
Slide 45
Example of Turning XML into HTML Jeff 555-1234 555-4321
lightgrey
Slide 46 Welcome Welcome!"> Welcome Welcome!"> Welcome
Welcome!" title="HTML Document in an XSL Template Welcome
Welcome!">
HTML Document in an XSL Template Welcome Welcome!
Slide 47 Welcome Welcome !"> Welcome Welcome !"> Welcome
Welcome !" title="Extracting the Member Name Welcome Welcome
!">
Extracting the Member Name Welcome Welcome !
Slide 48
Extracting a Value from an XML Document, Navigating the XML
Document Extracting values: use the XSL element Navigating: The
slash ("/") indicates parent/child relationship A slash at the
beginning of the path indicates that it is an absolute path,
starting from the top of the XML document
/FitnessCenter/Member/Name "Start from the top of the XML document,
go to the FitnessCenter element, from there go to the Member
element, and from there go to the Name element."
Slide 49
Document / PI Element FitnessCenter Element Member Element Name
Element Phone Element Phone Element FavoriteColor Text Jeff Text
555-1234 Text 555-4321 Text lightgrey
Slide 50 Welcome Welcome ! "> Welcome Welcome ! (see
html-example03)"> Welcome Welcome ! " title="Extract the
FavoriteColor and use it as the bgcolor Welcome Welcome !
">
Extract the FavoriteColor and use it as the bgcolor Welcome
Welcome ! (see html-example03)
Slide 51 To extract the value of an XML element and use it as
an attrib">
Note Attribute values cannot contain " " - Consequently, the
following is NOT valid: "> To extract the value of an XML
element and use it as an attribute value you must use curly braces:
Evaluate the expression within the curly braces. Assign the value
to the attribute.
Slide 52 Welcome Welcome ! Your home phone number"> Welcome
Welcome ! Your home phone number is:"> Welcome Welcome ! Your
home phone number" title="Extract the Home Phone Number Welcome
Welcome ! Your home phone number">
Extract the Home Phone Number Welcome Welcome ! Your home phone
number is:
Slide 53
Creating the Result Tree... Further XSL elements for...
Numbering Conditions Repetition...
Slide 54
Creating the Result Tree: Repetition customers...
Slide 55
Creating the Result Tree: Sorting
Slide 56
More on XSL XSL(T): Conflict resolution for multiple applicable
rules Modularization XSL Formatting Objects a la CSS XPath
(navigation syntax + functions) = XSLT XPointer...
Slide 57
XQuery: Querying XML Sources Functional Query Language Operates
on the Xpath/XQuery data model List of ordered trees A document is
list of size 1 XQuery expressions are composed of Path expressions
Element constructors FLWR expressions and more
Slide 58
chapter Path Expressions
doc(zoo.xml)//chapter[2]//figure[caption=Tree Frogs] In the second
chapter of the document zoo.xml find the figures with caption Tree
Frogs book chapter appendixpart section paragraph figure caption
Tree Frogs chapter paragraph figure caption Just Frogs part
Slide 59
More Path Expressions Find the first immediate chapter
subelements of immediate part subelements of the document zoo.xml
and retrieve figures that have
doc(zoo.xml)/part/chapter[1]//figure[caption=Tree Frogs] chapter
book chapter appendixpart section paragraph figure caption Tree
Frogs chapter paragraph figure caption Just Frogs part
Slide 60
Element Construction
doc(zoo.xml)//chapter[2]//figure[caption=Tree Frogs] In the second
chapter of the document zoo.xml find the figures with caption Tree
Frogs and place them into an element called result figure caption
Tree Frogs result
Slide 61
Bibliography Example Data Set Aho Hopcroft Ullman Automata
Theory Morgan Kaufmann 1998 >/year> Ullman Database Systems
Morgan Kaufmann 1998 >/year> Abiteboul Buneman Suciu Automata
Theory Prentice Hall 1998 >/year>
Slide 62
Reviews Example Data Set Automata Theory Its the best in
automata theory A definitive textbook
Slide 63
For-Let-Where-Return (FLWR) FOR $b in doc(bib.xml)//book WHERE
$b/publisher = Morgan Kaufmann RETURN $b/title List the titles of
book s published by Morgan Kaufmann year bib book publisher Morgan
Kaufmann year publisher Morgan Kaufmann 1998 book year publisher
Prentice Hall 1998 title
Slide 64
Think (tuples of) variable bindings FOR/LET WHERE RETURN
Ordered lists of tuples of variable bindings Tuples of that satisfy
the conditions List of trees $b book $b book title year bib book
publisher Morgan Kaufmann year publisher Morgan Kaufmann 1998 book
year publisher Prentice Hall 1998 title
Slide 65
FOR $b in doc(bib.xml)//book WHERE $b/year > 1990 RETURN
$b/author Return the list of authors who published after 1990
Slide 66
Tuples FOR $p in distinct(doc(bib.xml)//publisher) LET $b :=
document(bib.xml)//book[publisher = $p] WHERE count($b) > 1
RETURN $p List publishers who have published more than 1 book
Tuples ($p, $b) are formulated
Slide 67
Boolean Expressions in WHERE FOR $b in doc(bib.xml)//book WHERE
$b/publisher = Morgan Kaufmann AND $b/year = 1998 RETURN $b/title
List the titles of book s published by Morgan Kaufmann in 1998
Slide 68
Joins FOR $b in doc(bib.xml)/book, $r in doc(review.xml)/review
WHERE $b/title = $r/title RETURN {$b/@*} {$b/*} {$r/comment} For
every book with a matching review output a book_with_review that
contains all the attributes and subelements of book and the comment
subelements of review Aho Hopcroft Ullman Automata Theory Morgan
Kaufmann 1998 >/year> Its the best in automata theory A
definitive textbook
Slide 69
Relax Order Conditions FOR $b in unordered(doc(bib.xml)//book)
WHERE $b/publisher = Morgan Kaufmann AND $b/year = 1998 RETURN
$b/title List the titles of book s published by Morgan Kaufmann in
1998 Very important feature in dealing with relational sources and
other set-oriented sources. SELECT title FROM bib WHERE publisher =
Morgan Kaufmann AND year =1998 Depending on the indices and access
methods used, the SQL query processor may deliver the tuples in
different order
Slide 70
Nested queries FOR $a IN
distinct(document(bib.xml)//author/text()) RETURN $a { FOR $b IN
document(bib.xml)//book[author=$a] RETURN $b/title } Invert the
structure of the input document so that there is a list of author
elements containing the name of the author and the list of books he
wrote
Slide 71
Conditionals FOR $b IN doc(bib.xml)/book RETURN {$b/title} {IF
count($b/author) < 3 {$b/author} ELSE {$b/author[1], and
others
Slide 72
Existential and Universal Quantification FOR $b in
doc(bib.xml)/book WHERE $b/author = Ullman RETURN $b FOR $b in
doc(bib.xml)/book WHERE EVERY $author IN $b/author SATISFIES
$author= Ullman RETURN $b Return books where at least one of the
authors is Ullman Return books where all authors are Ullman
Slide 73
Functions DEFINE FUNCTION depth($e) RETURNS xsd:integer { IF
(empty($e/*) THEN 1 ELSE max(depth($e/*) + 1 } FOR $b in
doc(bib.xml)/book RETURN depth($b)
Slide 74
Applicability of XML Query Languages (Xquery) XQuery standard
does NOT elaborate on the physical aspects of the XML sources
Custom functions can provide access and reference to the source(s)
document(test.xml), source(view1) Question: as we go down the list
of uses of XQuery compare with XSL
Slide 75
XQuery on files, DOM objects, event streams, messages Usage
scenarios Transformation and processing of messages Significant
(but not killer) advantages over XSL Minor performance optimization
superiority Better streaming, pipelining Cleaner extensible
language Many academic and industrial prototypes of XQuery on files
XML File XQuery Processor XQuery DOM Object SAX Stream
Slide 76
Typical Scenario: XML Messaging Wrapper RDBMS Wrapper SAP ERP
Application Requests in native language or special wrapper API
SELECT * FROM Customer, Order WHERE customer.name=Joe AND
order.name=Joe Cdom = Sap(conn1, joe) SOAP service Message
Transformer
Slide 77
Summary of Steps Developers Program Issues SQL Query Wrapper
returns SQL result wrapped as XML message Developers XQuery
transforms XML message to XML format needed by app
Slide 78
Typical Scenario: XML Messaging Wrapper RDBMS Wrapper SAP ERP
Application SOAP service Message Transformer Joe 100M fish Joe 100M
meat Joe 780M fish meat FOR $cn IN distinct(msg(123)/customer/name)
RETURN $cn 7.8 * msg(123)/customer[name=$cn]/balance FOR $c IN
msg(123)/customer WHERE $c/name = $cn RETURN {$c/order}
Slide 79
Direct XQuery on Databases Xquery Processor RDBMS XML View of
Relational DB tuple reldb orders customers name tuple balance Joe
100M XQuery SQL (one or more) tuples XML result Lets write a
Russian Doll schema
Slide 80
XQuery on Relational Databases FOR $c IN db(1)/customers/tuple
WHERE $c/name = Joe RETURN $c/name 7.8 * $c/balance FOR $o IN
db(1)/orders/tuple WHERE $c/name = $o/name RETURN $o Xquery
Processor RDBMS XML View of Relational DB SELECT * FROM customers
WHERE name = Joe For each customer #c SELECT * FROM orders WHERE
orders.name = #c.name Merge results Joe 780M fish meat
Slide 81
Summary of Steps Developers Program Issues Xquery on XML view
of SQL DB Xquery Processor automatically sends SQL queries to DB
and structures XML result
Slide 82
XQuery on Relational Databases Single language for accessing
database and structuring XML result Avoids deficiencies of SQL in
dealing with nested structures, optional elements, etc
Slide 83
XQuery on Distributed Sources Xquery Processor (Mediator) RDBMS
XML View of All Sources RDBMS XQuery XML result XML File
Slide 84
Example: Access to Two Relational Databases Xquery Processor
(Mediator) RDBMS (orders) XML View of All Relational DBs RDBMS
(customers) XQuery XML result FOR $c IN db(1)/customers/tuple WHERE
$c/name = Joe RETURN $c/name 7.8 * $c/balance FOR $o IN
db(2)/orders/tuple WHERE $c/name = $o/name RETURN $o
Slide 85
XQuery on Integrated Views Xquery Processor (Mediator) RDBMS
Virtual Integrated XML View RDBMS XQuery XML result XML File
customer view customers name customer balance Joe 100M orders order
Lets write the Joe query again
Slide 86
and using XQuery to build the view Xquery Processor (Mediator)
RDBMS Virtual Integrated XML View RDBMS XQuery XML result XML File
XQuery as View Definition
Slide 87
View = Query FOR $c IN db(1)/customers/tuple RETURN $c/name 7.8
* $c/balance FOR $o IN db(2)/orders/tuple WHERE $c/name = $o/name
RETURN $o