of 85 /85
The XML Standard

The XML Standard Overview of our XML Standards Motivation: HTML vs XML XML 101: syntax, elements, attributes, DTDs, … XML 201: XML Schema, Namespaces

Embed Size (px)

Text of The XML Standard Overview of our XML Standards Motivation: HTML vs XML XML 101: syntax, elements,...

  • Slide 1
  • Slide 2
  • The XML Standard
  • Slide 3
  • Overview of our XML Standards Motivation: HTML vs XML XML 101: syntax, elements, attributes, DTDs, XML 201: XML Schema, Namespaces XSLT: Transforming and Rendering XML XQuery: Search, Transform & Integrate
  • Slide 4 simple, very flexible data exchange format: semistructured data model => new applications: Information exchange (B2B), sharing (diglib), integration ("mediation"), archival,... Web site mangement (XML+XSL stylesheets),...">
  • So what is XML (all about)? Executive Summary: XML = HTML idiosyncrasies (simplified syntax) + user-definable ("semantic") tags Separation of data and its presentation => simple, very flexible data exchange format: semistructured data model => new applications: Information exchange (B2B), sharing (diglib), integration ("mediation"), archival,... Web site mangement (XML+XSL stylesheets),...
  • Slide 5
  • Whats Wrong with HTML? Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. "ObjectFusion in Mediator Systems". In VLDB 96. Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina. Object Fusion in Mediator Systems. In VLDB 96. HTML confuses presentation with content
  • Slide 6
  • ...Whats Wrong with HTML... Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. "ObjectFusion in Mediator Systems". In VLDB 96. No Explicit Structure, Semantics, or Object-Orientation Author Conference Title
  • Slide 7 HTML is inappropriate for data exchange automation of information management (retrieval, manipulation, integration)">
  • ... And Some Repercussions Lack of schema/semantics when querying the Web (HTML): "find documents (books, papers,...) where author = Michael Jackson" (... and learn how software engineering meets the moon walker...) "create a list of M. Jackson's books and (if available) their prices" => HTML is inappropriate for data exchange automation of information management (retrieval, manipulation, integration)
  • Slide 8
  • XML is Based on Markup Y.Papakonstantinou S. Abiteboul H. Garcia-Molina Object Fusion in Mediator Systems VLDB 96 Markup indicates structure and semantics Decoupled from presentation
  • Slide 9
  • Elements and their Content element element name Character content Element Content Empty Element Y.Papakonstantinou S. Abiteboul H. Garcia-Molina Object Fusion in Mediator Systems VLDB 96
  • Slide 10
  • Element Attributes Y.Papakonstantinou S. Abiteboul H. Garcia-Molina Object Fusion in Mediator Systems VLDB 96 Attribute name Attribute Value
  • Slide 11
  • XML = Labeled Ordered Trees Yannis Serge... Object Fusion... bibliography paper authors author... title fullpaper YannisSerge Object Fusion... paper semistructured data labeled trees/graphs can also represent relational and object-oriented data @id 23
  • Slide 12
  • How do I share structure and metadata/semantics with my community? In Search of the Lost Structure & Semantics How to make all this automatable? How do I learn and use the element structure of a document?
  • Slide 13 improve query formulation, execution,... XML Schema defines structure and data types XML Namespaces identify your vocabulary Resource Description Framework (RDF) simple metadata model">
  • Adding Structure and Semantics XML Document Type Definitions (DTDs): define the structure of "allowed" documents (i.e., valid wrt. a DTD) database schema => improve query formulation, execution,... XML Schema defines structure and data types XML Namespaces identify your vocabulary Resource Description Framework (RDF) simple metadata model
  • Slide 14
  • XML DTDs as Extended CFGs bibliography paper* paper authors fullPaper? title booktitle authors author+ lhs = element (name) rhs = regular expression over elements + strings (PCDATA) XML DTD Grammar
  • Slide 15
  • Document Type Definitions (DTDs) Define and Constrain Element Names & Structure Element Type Declaration Attribute List Declaration
  • Slide 16
  • Element Declarations Character content Authors followed by optional fullpaper, followed by title, followed by booktitle Sequence of 1 or more author Sequence of 0 or more paper
  • Slide 17
  • Element Content Declarations
  • Slide 18
  • Attributes Y.Papakonstantinou Object Fusion in Mediator Systems Object Identity Attribute CDATA (character data) Yannis info IDREF intradocument reference Reference to external ENTITY
  • Slide 19
  • Attribute Types
  • Slide 20
  • More on Attribute Declarations Attributes may be REQUIRED IMPLIED (optional) can have default values default value may be FIXED
  • Slide 21
  • Uses of XML Entities Physical partition size, reuse, "modularity", (both XML docs & DTDs) Non-XML data unparsed entities binary data Non-standard characters character entities Shorthand for phrases & markup
  • Slide 22
  • Types of Entities Internal (to a doc) vs. External ( use URI) General (in XML doc) vs. Parameter (in DTD) Parsed (XML) vs. Unparsed (non-XML)
  • Slide 23
  • Internal Text Entities We all use the &WWW;. Internal Text Entity Declaration Entity Reference We all use the World Wide Web. Logically equivalent to actually appearing
  • Slide 24
  • Unparsed (& "Binary") Entities... and unparsed entity Element with ENTITY attribute Declare attribute type to be entity NOTATION declaration (helper app ) Declare external...
  • Slide 25
  • From Docs to Data: XML Schema XML DTDs (part of the XML spec.) flexible, semistructured data model (nesting, ANY, ?, *, |,...) but document-oriented (SGML heritage) XML Schema (W3C working draft) schema definition language in XML data-oriented: data types extends capabilities of DTD
  • Slide 26
  • Sample Data for Introduction to XML Schema Being a Dog Is a Full-Time Job Charles M. Schulz Snoopy Peppermint Patty 1950-10-04 extroverted beagle Peppermint Patty 1966-08-22 bold, brash and tomboyish
  • Slide 27
  • The Simple Russian Doll Approach to XML Schema Optional Namespace Definition Sequence Compositor Simple Type Content for title and author Complex Type Content for book Character may appear any number of times Basic Type of XML Schema
  • Slide 28 Simple "> Simple Type Elements Attributes Complex Type Element character Reference"> Simple " title="The Catalog Approach to XML Schema: Stand-Alone Declarations & References Simple ">
  • The Catalog Approach to XML Schema: Stand-Alone Declarations & References Simple Type Elements Attributes Complex Type Element character Reference
  • Slide 29 "> "> " title="Catalog Approach Contd ">
  • Catalog Approach Contd
  • Slide 30 nameType derived from xsd:string by having the xsd:maxLength facet restrict string to a Maximum of to 32 characters nameType used in the declaration of characterType">
  • Named Types Write stand- alone named complex type or simple type declarations Primitive form of inheritance (called derivation) allows Restriction Extension nameType derived from xsd:string by having the xsd:maxLength facet restrict string to a Maximum of to 32 characters nameType used in the declaration of characterType
  • Slide 31 "> "> " title="Groups: Named containers of sets of Elements or Attributes ">
  • Groups: Named containers of sets of Elements or Attributes
  • Slide 32 So far we have seen sequences The group nameTypes "> So far we have seen sequences The group nameTypes consists of one of the element name the sequence containing firstName, middlename, lastName"> So far we have seen sequences The group nameTypes " title="Compositors: Sequence, Choice, All So far we have seen sequences The group nameTypes ">
  • Compositors: Sequence, Choice, All So far we have seen sequences The group nameTypes consists of one of the element name the sequence containing firstName, middlename, lastName
  • Slide 33 The characterType consists of name, a list of"> The characterType consists of name, a list of friend-of, since, and qualification particles in no particular order. (Compare with the sequence compositor.)"> The characterType consists of name, a list of" title="Compositors (contd) The characterType consists of name, a list of">
  • Compositors (contd) The characterType consists of name, a list of friend-of, since, and qualification particles in no particular order. (Compare with the sequence compositor.)
  • Slide 34
  • Derivation of Simple Types: Unions and Lists So far we have seen restrictions and facets The simple type isbnType will be either a 10-digit string (notice the pattern) the token "TBD or the token "NA"
  • Slide 35
  • Constraints: Uniqueness By inserting xsd:unique in the book element declaration we enforce that the character name s in each book are unique
  • Slide 36 ">
  • Namespaces
  • Slide 37 ">
  • Including Unknown Elements
  • Slide 38
  • Presenting XML: XSLT Why Stylesheets? separation of content (XML) from presentation (XSL) Why not just CSS for XML? XSL is far more powerful: selecting elements transforming the XML tree content based display (result may depend on data)
  • Slide 39
  • XSLT Overview XSLT stylesheets are denoted in XML syntax XSL components: 1. a language for transforming XML documents (XSLT: integral part of the XSL specification) 2. an XML formatting vocabulary (Formatting Objects: >90% of the formatting properties inherited from CSS)
  • Slide 40
  • XSLT Processing Model XML source tree XML,HTML, result tree XSL stylesheet Transformatio n
  • Slide 41
  • XSLT Processing Model XSL stylesheet: collection of template rules template rule: (pattern template) main steps: match pattern against source tree instantiate template (replace current node . by the template in the result tree) select further nodes for processing control can be program-driven ("pull":...) data/event-driven ("push":...)
  • Slide 42
  • Template Rule: Example (i) match pattern: process elements (ii) instantiate template: replace each a product with two HTML tables (iii) select the grandchildren ( sales/domestic , sales/foreign ) for further processing pattern template
  • Slide 43
  • Match/Select Patterns match patterns select patterns = defined in http://w3.org/TR/xpath Examples: /mybook/chapter[2]/section/* chapter|appendix chapter//para div[@class="appendix" and position() mod 2 = 1]//para ../@lang
  • Slide 44
  • Creating the Result Tree... Literal result elements: non-XSL elements (e.g., HTML) appear literally in the result tree Constructing elements: (similar for xsl:attribute, xsl:text, xsl:comment,) Generating text: attribute & children definition
  • Slide 45
  • Example of Turning XML into HTML Jeff 555-1234 555-4321 lightgrey
  • Slide 46 Welcome Welcome!"> Welcome Welcome!"> Welcome Welcome!" title="HTML Document in an XSL Template Welcome Welcome!">
  • HTML Document in an XSL Template Welcome Welcome!
  • Slide 47 Welcome Welcome !"> Welcome Welcome !"> Welcome Welcome !" title="Extracting the Member Name Welcome Welcome !">
  • Extracting the Member Name Welcome Welcome !
  • Slide 48
  • Extracting a Value from an XML Document, Navigating the XML Document Extracting values: use the XSL element Navigating: The slash ("/") indicates parent/child relationship A slash at the beginning of the path indicates that it is an absolute path, starting from the top of the XML document /FitnessCenter/Member/Name "Start from the top of the XML document, go to the FitnessCenter element, from there go to the Member element, and from there go to the Name element."
  • Slide 49
  • Document / PI Element FitnessCenter Element Member Element Name Element Phone Element Phone Element FavoriteColor Text Jeff Text 555-1234 Text 555-4321 Text lightgrey
  • Slide 50 Welcome Welcome ! "> Welcome Welcome ! (see html-example03)"> Welcome Welcome ! " title="Extract the FavoriteColor and use it as the bgcolor Welcome Welcome ! ">
  • Extract the FavoriteColor and use it as the bgcolor Welcome Welcome ! (see html-example03)
  • Slide 51 To extract the value of an XML element and use it as an attrib">
  • Note Attribute values cannot contain " " - Consequently, the following is NOT valid: "> To extract the value of an XML element and use it as an attribute value you must use curly braces: Evaluate the expression within the curly braces. Assign the value to the attribute.
  • Slide 52 Welcome Welcome ! Your home phone number"> Welcome Welcome ! Your home phone number is:"> Welcome Welcome ! Your home phone number" title="Extract the Home Phone Number Welcome Welcome ! Your home phone number">
  • Extract the Home Phone Number Welcome Welcome ! Your home phone number is:
  • Slide 53
  • Creating the Result Tree... Further XSL elements for... Numbering Conditions Repetition...
  • Slide 54
  • Creating the Result Tree: Repetition customers...
  • Slide 55
  • Creating the Result Tree: Sorting
  • Slide 56
  • More on XSL XSL(T): Conflict resolution for multiple applicable rules Modularization XSL Formatting Objects a la CSS XPath (navigation syntax + functions) = XSLT XPointer...
  • Slide 57
  • XQuery: Querying XML Sources Functional Query Language Operates on the Xpath/XQuery data model List of ordered trees A document is list of size 1 XQuery expressions are composed of Path expressions Element constructors FLWR expressions and more
  • Slide 58
  • chapter Path Expressions doc(zoo.xml)//chapter[2]//figure[caption=Tree Frogs] In the second chapter of the document zoo.xml find the figures with caption Tree Frogs book chapter appendixpart section paragraph figure caption Tree Frogs chapter paragraph figure caption Just Frogs part
  • Slide 59
  • More Path Expressions Find the first immediate chapter subelements of immediate part subelements of the document zoo.xml and retrieve figures that have doc(zoo.xml)/part/chapter[1]//figure[caption=Tree Frogs] chapter book chapter appendixpart section paragraph figure caption Tree Frogs chapter paragraph figure caption Just Frogs part
  • Slide 60
  • Element Construction doc(zoo.xml)//chapter[2]//figure[caption=Tree Frogs] In the second chapter of the document zoo.xml find the figures with caption Tree Frogs and place them into an element called result figure caption Tree Frogs result
  • Slide 61
  • Bibliography Example Data Set Aho Hopcroft Ullman Automata Theory Morgan Kaufmann 1998 >/year> Ullman Database Systems Morgan Kaufmann 1998 >/year> Abiteboul Buneman Suciu Automata Theory Prentice Hall 1998 >/year>
  • Slide 62
  • Reviews Example Data Set Automata Theory Its the best in automata theory A definitive textbook
  • Slide 63
  • For-Let-Where-Return (FLWR) FOR $b in doc(bib.xml)//book WHERE $b/publisher = Morgan Kaufmann RETURN $b/title List the titles of book s published by Morgan Kaufmann year bib book publisher Morgan Kaufmann year publisher Morgan Kaufmann 1998 book year publisher Prentice Hall 1998 title
  • Slide 64
  • Think (tuples of) variable bindings FOR/LET WHERE RETURN Ordered lists of tuples of variable bindings Tuples of that satisfy the conditions List of trees $b book $b book title year bib book publisher Morgan Kaufmann year publisher Morgan Kaufmann 1998 book year publisher Prentice Hall 1998 title
  • Slide 65
  • FOR $b in doc(bib.xml)//book WHERE $b/year > 1990 RETURN $b/author Return the list of authors who published after 1990
  • Slide 66
  • Tuples FOR $p in distinct(doc(bib.xml)//publisher) LET $b := document(bib.xml)//book[publisher = $p] WHERE count($b) > 1 RETURN $p List publishers who have published more than 1 book Tuples ($p, $b) are formulated
  • Slide 67
  • Boolean Expressions in WHERE FOR $b in doc(bib.xml)//book WHERE $b/publisher = Morgan Kaufmann AND $b/year = 1998 RETURN $b/title List the titles of book s published by Morgan Kaufmann in 1998
  • Slide 68
  • Joins FOR $b in doc(bib.xml)/book, $r in doc(review.xml)/review WHERE $b/title = $r/title RETURN {$b/@*} {$b/*} {$r/comment} For every book with a matching review output a book_with_review that contains all the attributes and subelements of book and the comment subelements of review Aho Hopcroft Ullman Automata Theory Morgan Kaufmann 1998 >/year> Its the best in automata theory A definitive textbook
  • Slide 69
  • Relax Order Conditions FOR $b in unordered(doc(bib.xml)//book) WHERE $b/publisher = Morgan Kaufmann AND $b/year = 1998 RETURN $b/title List the titles of book s published by Morgan Kaufmann in 1998 Very important feature in dealing with relational sources and other set-oriented sources. SELECT title FROM bib WHERE publisher = Morgan Kaufmann AND year =1998 Depending on the indices and access methods used, the SQL query processor may deliver the tuples in different order
  • Slide 70
  • Nested queries FOR $a IN distinct(document(bib.xml)//author/text()) RETURN $a { FOR $b IN document(bib.xml)//book[author=$a] RETURN $b/title } Invert the structure of the input document so that there is a list of author elements containing the name of the author and the list of books he wrote
  • Slide 71
  • Conditionals FOR $b IN doc(bib.xml)/book RETURN {$b/title} {IF count($b/author) < 3 {$b/author} ELSE {$b/author[1], and others
  • Slide 72
  • Existential and Universal Quantification FOR $b in doc(bib.xml)/book WHERE $b/author = Ullman RETURN $b FOR $b in doc(bib.xml)/book WHERE EVERY $author IN $b/author SATISFIES $author= Ullman RETURN $b Return books where at least one of the authors is Ullman Return books where all authors are Ullman
  • Slide 73
  • Functions DEFINE FUNCTION depth($e) RETURNS xsd:integer { IF (empty($e/*) THEN 1 ELSE max(depth($e/*) + 1 } FOR $b in doc(bib.xml)/book RETURN depth($b)
  • Slide 74
  • Applicability of XML Query Languages (Xquery) XQuery standard does NOT elaborate on the physical aspects of the XML sources Custom functions can provide access and reference to the source(s) document(test.xml), source(view1) Question: as we go down the list of uses of XQuery compare with XSL
  • Slide 75
  • XQuery on files, DOM objects, event streams, messages Usage scenarios Transformation and processing of messages Significant (but not killer) advantages over XSL Minor performance optimization superiority Better streaming, pipelining Cleaner extensible language Many academic and industrial prototypes of XQuery on files XML File XQuery Processor XQuery DOM Object SAX Stream
  • Slide 76
  • Typical Scenario: XML Messaging Wrapper RDBMS Wrapper SAP ERP Application Requests in native language or special wrapper API SELECT * FROM Customer, Order WHERE customer.name=Joe AND order.name=Joe Cdom = Sap(conn1, joe) SOAP service Message Transformer
  • Slide 77
  • Summary of Steps Developers Program Issues SQL Query Wrapper returns SQL result wrapped as XML message Developers XQuery transforms XML message to XML format needed by app
  • Slide 78
  • Typical Scenario: XML Messaging Wrapper RDBMS Wrapper SAP ERP Application SOAP service Message Transformer Joe 100M fish Joe 100M meat Joe 780M fish meat FOR $cn IN distinct(msg(123)/customer/name) RETURN $cn 7.8 * msg(123)/customer[name=$cn]/balance FOR $c IN msg(123)/customer WHERE $c/name = $cn RETURN {$c/order}
  • Slide 79
  • Direct XQuery on Databases Xquery Processor RDBMS XML View of Relational DB tuple reldb orders customers name tuple balance Joe 100M XQuery SQL (one or more) tuples XML result Lets write a Russian Doll schema
  • Slide 80
  • XQuery on Relational Databases FOR $c IN db(1)/customers/tuple WHERE $c/name = Joe RETURN $c/name 7.8 * $c/balance FOR $o IN db(1)/orders/tuple WHERE $c/name = $o/name RETURN $o Xquery Processor RDBMS XML View of Relational DB SELECT * FROM customers WHERE name = Joe For each customer #c SELECT * FROM orders WHERE orders.name = #c.name Merge results Joe 780M fish meat
  • Slide 81
  • Summary of Steps Developers Program Issues Xquery on XML view of SQL DB Xquery Processor automatically sends SQL queries to DB and structures XML result
  • Slide 82
  • XQuery on Relational Databases Single language for accessing database and structuring XML result Avoids deficiencies of SQL in dealing with nested structures, optional elements, etc
  • Slide 83
  • XQuery on Distributed Sources Xquery Processor (Mediator) RDBMS XML View of All Sources RDBMS XQuery XML result XML File
  • Slide 84
  • Example: Access to Two Relational Databases Xquery Processor (Mediator) RDBMS (orders) XML View of All Relational DBs RDBMS (customers) XQuery XML result FOR $c IN db(1)/customers/tuple WHERE $c/name = Joe RETURN $c/name 7.8 * $c/balance FOR $o IN db(2)/orders/tuple WHERE $c/name = $o/name RETURN $o
  • Slide 85
  • XQuery on Integrated Views Xquery Processor (Mediator) RDBMS Virtual Integrated XML View RDBMS XQuery XML result XML File customer view customers name customer balance Joe 100M orders order Lets write the Joe query again
  • Slide 86
  • and using XQuery to build the view Xquery Processor (Mediator) RDBMS Virtual Integrated XML View RDBMS XQuery XML result XML File XQuery as View Definition
  • Slide 87
  • View = Query FOR $c IN db(1)/customers/tuple RETURN $c/name 7.8 * $c/balance FOR $o IN db(2)/orders/tuple WHERE $c/name = $o/name RETURN $o