1 XML Semistructured Data Extensible Markup Language Document Type Definitions

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li> Slide 1 </li> <li> 1 XML Semistructured Data Extensible Markup Language Document Type Definitions </li> <li> Slide 2 </li> <li> 2 Semistructured Data uAnother data model, based on trees. uMotivation: flexible representation of data. wOften, data comes from multiple sources with differences in notation, meaning, etc. uMotivation: sharing of documents among systems and databases. </li> <li> Slide 3 </li> <li> 3 The Information-Integration Problem uRelated data exists in many places and could, in principle, work together. uBut different databases differ in: 1.Model (relational, object-oriented?). 2.Schema (normalized/unnormalized?). 3.Terminology: are consultants employees? Retirees? Subcontractors? 4.Conventions (meters versus feet?). </li> <li> Slide 4 </li> <li> 4 Example uEvery bar has a database. wOne may use a relational DBMS; another keeps the menu in an MS-Word document. wOne stores the phones of distributors, another does not. wOne distinguishes ales from other beers, another doesnt. wOne counts beer inventory by bottles, another by cases. </li> <li> Slide 5 </li> <li> 5 Two Approaches to Integration 1.Warehousing : Make copies of the data sources at a central site and transform it to a common schema. wReconstruct data daily/weekly, but do not try to keep it more up-to-date than that. 2.Mediation : Create a view of all sources, as if they were integrated. wAnswer a view query by translating it to terminology of the sources and querying them. </li> <li> Slide 6 </li> <li> 6 Warehouse Diagram Warehouse Wrapper Source 1Source 2 </li> <li> Slide 7 </li> <li> 7 A Mediator Mediator Wrapper Source 1Source 2 User query Query Result </li> <li> Slide 8 </li> <li> 8 Graphs of Semistructured Data uNodes = objects. uLabels on arcs (attributes, relationships). uAtomic values at leaf nodes (nodes with no arcs out). uFlexibility: no restriction on: wLabels out of a node. </li> <li> Slide 9 </li> <li> 9 Example: Data Graph Bud A.B. Gold1995 MapleJoes Miller beer bar manf servedAt name addr prize yearaward root The bar object for Joes Bar The beer object for Bud Notice a new kind of data. </li> <li> Slide 10 </li> <li> 10 XML uXML = Extensible Markup Language. uWhile HTML uses tags for formatting (e.g., italic), XML uses tags for semantics (e.g., this is an address). uKey idea: create tag sets for a domain (e.g., genomics), and translate all data into properly tagged XML documents. </li> <li> Slide 11 </li> <li> 11 Well-Formed and Valid XML uWell-Formed XML allows you to invent your own tags. wSimilar to labels in semistructured data. uValid XML involves a DTD (Document Type Definition), a grammar for tags. </li> <li> Slide 12 </li> <li> 12 Well-Formed XML uStart the document with a declaration, surrounded by. uNormal declaration is: Standalone = no DTD provided. uBalance of document is a root tag surrounding nested tags. </li> <li> Slide 13 </li> <li> 13 Tags uTags, as in HTML, are normally matched pairs, as . uTags may be nested arbitrarily. uXML tags are case sensitive. </li> <li> Slide 14 </li> <li> 14 Example: Well-Formed XML Joes Bar Bud 2.50 Miller 3.00 A NAME subobject A BEER subobject </li> <li> Slide 15 </li> <li> 15 XML and Semistructured Data uWell-Formed XML with nested tags is exactly the same idea as trees of semistructured data. uWe shall see that XML also enables nontree structures, as does the semistructured data model. </li> <li> Slide 16 </li> <li> 16 Example uThe XML document is: Joes Bar Bud2.50Miller3.00 PRICE BAR BARS NAME... BAR PRICE NAME BEER NAME </li> <li> Slide 17 </li> <li> 17 DTD Structure [ ( )&gt;... more elements... ]&gt; </li> <li> Slide 18 </li> <li> 18 DTD Elements uThe description of an element consists of its name (tag), and a parenthesized description of any nested tags. wIncludes order of subtags and their multiplicity. uLeaves (text elements) have #PCDATA (Parsed Character DATA ) in place of nested tags. </li> <li> Slide 19 </li> <li> 19 Example: DTD A BARS object has zero or more BARs nested within. A BAR has one NAME and one or more BEER subobjects. A BEER has a NAME and a PRICE. NAME and PRICE are text. </li> <li> Slide 20 </li> <li> 20 Element Descriptions uSubtags must appear in order shown. uA tag may be followed by a symbol to indicate its multiplicity. w* = zero or more. w+ = one or more. w? = zero or one. uSymbol | can connect alternative sequences of tags. </li> <li> Slide 21 </li> <li> 21 Example: Element Description uA name is an optional title (e.g., Prof.), a first name, and a last name, in that order, or it is an IP address: </li> <li> Slide 22 </li> <li> 22 Use of DTDs 1.Set standalone = no. 2.Either: a)Include the DTD as a preamble of the XML document, or b)Follow DOCTYPE and the by SYSTEM and a path to the file where the DTD can be found. </li> <li> Slide 23 </li> <li> 23 Example (a) Joes Bar Bud 2.50 Miller 3.00 The DTD The document </li> <li> Slide 24 </li> <li> 24 Example (b) uAssume the BARS DTD is in file bar.dtd. Joes Bar Bud 2.50 Miller 3.00 Get the DTD from the file bar.dtd </li> <li> Slide 25 </li> <li> 25 Attributes uOpening tags in XML can have attributes. uIn a DTD, declares an attribute for element E, along with its datatype. </li> <li> Slide 26 </li> <li> 26 Example: Attributes Bars can have an attribute kind, a character string describing the bar. Character string type; no tags Attribute is optional opposite: #REQUIRED </li> <li> Slide 27 </li> <li> 27 Example: Attribute Use uIn a document that allows BAR tags, we might see: Akasaka Sapporo 5.00... Note attribute values are quoted </li> <li> Slide 28 </li> <li> 28 IDs and IDREFs uAttributes can be pointers from one object to another. wCompare to HTMLs NAME = foo and HREF = #foo. uAllows the structure of an XML document to be a general graph, rather than just a tree. </li> <li> Slide 29 </li> <li> 29 Creating IDs uGive an element E an attribute A of type ID. uWhen using tag in an XML document, give its attribute A a unique value. uExample: </li> <li> Slide 30 </li> <li> 30 Creating IDREFs uTo allow objects of type F to refer to another object with an ID attribute, give F an attribute of type IDREF. uOr, let the attribute have type IDREFS, so the F object can refer to any number of other objects. </li> <li> Slide 31 </li> <li> 31 Example: IDs and IDREFs uLets redesign our BARS DTD to include both BAR and BEER subelements. Both bars and beers will have ID attributes called name. Bars have SELLS subobjects, consisting of a number (the price of one beer) and an IDREF theBeer leading to that beer. Beers have attribute soldBy, which is an IDREFS leading to all the bars that sell it. </li> <li> Slide 32 </li> <li> 32 The DTD Beer elements have an ID attribute called name, and a soldBy attribute that is a set of Bar names. SELLS elements have a number (the price) and one reference to a beer. Bar elements have name as an ID attribute and have one or more SELLS subelements. Explained next </li> <li> Slide 33 </li> <li> 33 Example Document 2.50 3.00 </li> <li> Slide 34 </li> <li> 34 Empty Elements uWe can do all the work of an element in its attributes. wLike BEER in previous example. Another example: SELLS elements could have attribute price rather than a value that is a price. </li> <li> Slide 35 </li> <li> 35 Example: Empty Element uIn the DTD, declare: uExample use: Note exception to matching tags rule </li> </ul>