The XML Standard Overview of our XML Standards Motivation: HTML vs XML XML 101: syntax, elements, attributes, DTDs, XML 201: XML Schema, Namespaces

  • View
    228

  • Download
    0

Embed Size (px)

Text of The XML Standard Overview of our XML Standards Motivation: HTML vs XML XML 101: syntax, elements,...

  • Slide 1
  • Slide 2
  • The XML Standard
  • Slide 3
  • Overview of our XML Standards Motivation: HTML vs XML XML 101: syntax, elements, attributes, DTDs, XML 201: XML Schema, Namespaces XSLT: Transforming and Rendering XML XQuery: Search, Transform & Integrate
  • Slide 4 simple, very flexible data exchange format: semistructured data model => new applications: Information exchange (B2B), sharing (diglib), integration ("mediation"), archival,... Web site mangement (XML+XSL stylesheets),...">
  • So what is XML (all about)? Executive Summary: XML = HTML idiosyncrasies (simplified syntax) + user-definable ("semantic") tags Separation of data and its presentation => simple, very flexible data exchange format: semistructured data model => new applications: Information exchange (B2B), sharing (diglib), integration ("mediation"), archival,... Web site mangement (XML+XSL stylesheets),...
  • Slide 5
  • Whats Wrong with HTML? Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. "ObjectFusion in Mediator Systems". In VLDB 96. Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina. Object Fusion in Mediator Systems. In VLDB 96. HTML confuses presentation with content
  • Slide 6
  • ...Whats Wrong with HTML... Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. "ObjectFusion in Mediator Systems". In VLDB 96. No Explicit Structure, Semantics, or Object-Orientation Author Conference Title
  • Slide 7 HTML is inappropriate for data exchange automation of information management (retrieval, manipulation, integration)">
  • ... And Some Repercussions Lack of schema/semantics when querying the Web (HTML): "find documents (books, papers,...) where author = Michael Jackson" (... and learn how software engineering meets the moon walker...) "create a list of M. Jackson's books and (if available) their prices" => HTML is inappropriate for data exchange automation of information management (retrieval, manipulation, integration)
  • Slide 8
  • XML is Based on Markup Y.Papakonstantinou S. Abiteboul H. Garcia-Molina Object Fusion in Mediator Systems VLDB 96 Markup indicates structure and semantics Decoupled from presentation
  • Slide 9
  • Elements and their Content element element name Character content Element Content Empty Element Y.Papakonstantinou S. Abiteboul H. Garcia-Molina Object Fusion in Mediator Systems VLDB 96
  • Slide 10
  • Element Attributes Y.Papakonstantinou S. Abiteboul H. Garcia-Molina Object Fusion in Mediator Systems VLDB 96 Attribute name Attribute Value
  • Slide 11
  • XML = Labeled Ordered Trees Yannis Serge... Object Fusion... bibliography paper authors author... title fullpaper YannisSerge Object Fusion... paper semistructured data labeled trees/graphs can also represent relational and object-oriented data @id 23
  • Slide 12
  • How do I share structure and metadata/semantics with my community? In Search of the Lost Structure & Semantics How to make all this automatable? How do I learn and use the element structure of a document?
  • Slide 13 improve query formulation, execution,... XML Schema defines structure and data types XML Namespaces identify your vocabulary Resource Description Framework (RDF) simple metadata model">
  • Adding Structure and Semantics XML Document Type Definitions (DTDs): define the structure of "allowed" documents (i.e., valid wrt. a DTD) database schema => improve query formulation, execution,... XML Schema defines structure and data types XML Namespaces identify your vocabulary Resource Description Framework (RDF) simple metadata model
  • Slide 14
  • XML DTDs as Extended CFGs bibliography paper* paper authors fullPaper? title booktitle authors author+ lhs = element (name) rhs = regular expression over elements + strings (PCDATA) XML DTD Grammar
  • Slide 15
  • Document Type Definitions (DTDs) Define and Constrain Element Names & Structure Element Type Declaration Attribute List Declaration
  • Slide 16
  • Element Declarations Character content Authors followed by optional fullpaper, followed by title, followed by booktitle Sequence of 1 or more author Sequence of 0 or more paper
  • Slide 17
  • Element Content Declarations
  • Slide 18
  • Attributes Y.Papakonstantinou Object Fusion in Mediator Systems Object Identity Attribute CDATA (character data) Yannis info IDREF intradocument reference Reference to external ENTITY
  • Slide 19
  • Attribute Types
  • Slide 20
  • More on Attribute Declarations Attributes may be REQUIRED IMPLIED (optional) can have default values default value may be FIXED
  • Slide 21
  • Uses of XML Entities Physical partition size, reuse, "modularity", (both XML docs & DTDs) Non-XML data unparsed entities binary data Non-standard characters character entities Shorthand for phrases & markup
  • Slide 22
  • Types of Entities Internal (to a doc) vs. External ( use URI) General (in XML doc) vs. Parameter (in DTD) Parsed (XML) vs. Unparsed (non-XML)
  • Slide 23
  • Internal Text Entities We all use the &WWW;. Internal Text Entity Declaration Entity Reference We all use the World Wide Web. Logically equivalent to actually appearing
  • Slide 24
  • Unparsed (& "Binary") Entities... and unparsed entity Element with ENTITY attribute Declare attribute type to be entity NOTATION declaration (helper app ) Declare external...
  • Slide 25
  • From Docs to Data: XML Schema XML DTDs (part of the XML spec.) flexible, semistructured data model (nesting, ANY, ?, *, |,...) but document-oriented (SGML heritage) XML Schema (W3C working draft) schema definition language in XML data-oriented: data types extends capabilities of DTD
  • Slide 26
  • Sample Data for Introduction to XML Schema Being a Dog Is a Full-Time Job Charles M. Schulz Snoopy Peppermint Patty 1950-10-04 extroverted beagle Peppermint Patty 1966-08-22 bold, brash and tomboyish
  • Slide 27
  • The Simple Russian Doll Approach to XML Schema Optional Namespace Definition Sequence Compositor Simple Type Content for title and author Complex Type Content for book Character may appear any number of times Basic Type of XML Schema
  • Slide 28 Simple "> Simple Type Elements Attributes Complex Type Element character Reference"> Simple " title="The Catalog Approach to XML Schema: Stand-Alone Declarations & References Simple ">
  • The Catalog Approach to XML Schema: Stand-Alone Declarations & References Simple Type Elements Attributes Complex Type Element character Reference
  • Slide 29 "> "> " title="Catalog Approach Contd ">
  • Catalog Approach Contd
  • Slide 30 nameType derived from xsd:string by having the xsd:maxLength facet restrict string to a Maximum of to 32 characters nameType used in the declaration of characterType">
  • Named Types Write stand- alone named complex type or simple type declarations Primitive form of inheritance (called derivation) allows Restriction Extension nameType derived from xsd:string by having the xsd:maxLength facet restrict string to a Maximum of to 32 characters nameType used in the declaration of characterType
  • Slide 31 "> "> " title="Groups: Named containers of sets of Elements or Attributes ">
  • Groups: Named containers of sets of Elements or Attributes
  • Slide 32 So far we have seen sequences The group nameTypes "> So far we have seen sequences The group nameTypes consists of one of the element name the sequence containing firstName, middlename, lastName"> So far we have seen sequences The group nameTypes " title="Compositors: Sequence, Choice, All So far we have seen sequences The group nameTypes ">
  • Compositors: Sequence, Choice, All So far we have seen sequences The group nameTypes consists of one of the element name the sequence containing firstName, middlename, lastName
  • Slide 33 The characterType consists of name, a list of"> The characterType consists of name, a list of friend-of, since, and qualification particles in no particular order. (Compare with the sequence compositor.)"> The characterType consists of name, a list of" title="Compositors (contd) The characterType consists of name, a list of">
  • Compositors (contd) The characterType consists of name, a list of friend-of, since, and qualification particles in no particular order. (Compare with the sequence compositor.)
  • Slide 34
  • Derivation of Simple Types: Unions and Lists So far we have seen restrictions and facets The simple type isbnType will be either a 10-digit string (notice the pattern) the token "TBD or the token "NA"
  • Slide 35
  • Constraints: Uniqueness By inserting xsd:unique in the book element declaration we enforce that the character name s in each book are unique
  • Slide 36 ">
  • Namespaces
  • Slide 37 ">
  • Including Unknown Elements
  • Slide 38
  • Presenting XML: XSLT Why Stylesheets? separation of content (XML) from presentation (XSL) Why not just CSS for XML? XSL is far more powerful: selecting elements transforming