69
Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

Digital Documents

Gisle HannemyrAutumn 2002

XML vocabularies and technologiesXML DOM, XschemaXlink, XML Base, Xpointer, XpathXTM and RDF

Page 2: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #2

Outline of lectures

• Act 1: The Resource Discovery Problem Why the Internet is not a library

• Act 2: Introducing Semantic Markup: XML, CSS, XSL XML: eXtensible Markup Language CSS: Cascading StyleSheets XSL: eXtensible Stylesheet Language

• Act 3: XML vocabularies and technologies Xlink, XML Base, Xpointer, Xpath XML DOM, Xschema XTM: XML Topic maps RDF: Resource Description Framework

Page 3: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #3

Document Object ModelIntroduction

• The Document Object Model (DOM) is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents.

• The document can be further processed and the results of that processing can be incorporated back into the presented page.

• Scripts and methods that exploits the DOM is often known as “dynamic HTML” (e.g. JavaScript).

• The DOM is also important as a programming interface for XML documents.

Page 4: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #4

XML DOMIntroduction

• The XML Document Object Model (DOM) is a programming interface for XML documents. It defines the way an XML document can be accessed and manipulated.

• The objective for the XML DOM has been to provide a standard programming interface to a wide variety of applications. The XML DOM is designed to be used with any programming language and any operating system.

• With the XML DOM, a programmer can create an XML document, navigate its structure, and add, modify, or delete its elements.

Page 5: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #5

XML DOMParsing the DOM

• The XML DOM represents a tree view of the XML document. The documentElement is the top-level of the tree. This element has one or many childNodes that represent the branches of the tree.

• A Node Interface Model (NIM) is used to access the individual elements in the node tree. As an example, the childNodes property of the documentElement can be accessed with a for…each construct to enumerate each individual node.

• An XML parser supports all the necessary functions to traverse the node tree, access the nodes and their attribute values, insert and delete nodes, and convert the node tree back to XML.

Page 6: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #6

XML DOMParsing the DOM

• The DOM exposes the XML document as a tree: The DocumentElement is the top or root of the tree. This root element can have one or more child nodes

which represent the branches of the tree.

• The main objects exposed by a DOM are: DOMDocument XMLDOMNode XMLDOMNodeList XMLDOMNamedNodeMap

Page 7: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #7

XML DOMExample

Simple JavaScript code that loads XML-document note.xml and displays the first node (i.e. 0) of document tree:

<html><body><script type="text/javascript">var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")xmlDoc.async="false”xmlDoc.load("note.xml")document.write("The first XML element in the file contains: ")document.write(xmlDoc.documentElement.childNodes.item(0).text)</script></body></html>

Page 8: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #8

XML SchemaOverview

• Like a DTD, the purpose of a schema is to define a class of XML documents.

• Unlike a DTD, a schema: supports simple and complex types. allows you to create, extend and reuse types. supports namespaces and schema reuse.

• The term “instance document” is often used to describe an XML document that conforms to a particular schema.

• Both instances and schemas may exist as byte streams or database record fields, but to simplify we will say “document”.

Page 9: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #9

XML SchemaExample: Instance document po.xml

This instance document is a simple purchase order:<?xml version="1.0"?><purchaseOrder orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo>

cont. on next slide …

Page 10: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #10

XML SchemaExample: Instance document po.xml

… cont. from previous slide

<comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate> </item> </items></purchaseOrder>

Page 11: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #11

XML SchemaComplex and Simple Types

• An element that contains subelements or attributes, has complex type.

• An element that contains numbers (or strings, dates, etc.) but do not contain any subelements, has simple type.

• Attributes always have simple types.

Page 12: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #12

XML SchemaThe PO Schema po.xsd

• The complex types in the instance document, and some of the simple types, are defined in the PO schema.

• The other simple types are built-in XML Schema types.

• The instance document may or may not reference a schema explicitly. In our first example we chose not to mention the schema.

• The PO Schema is in a file called po.xsd:

Page 13: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #13

XML SchemaThe PO Schema po.xsd (1 of 5)

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<xsd:annotation> <xsd:documentation xml:lang="en"> Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. </xsd:documentation> </xsd:annotation>

cont. on next slide …

Page 14: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #14

XML SchemaThe PO Schema po.xsd (2 of 5)

… cont. from previous slide

<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

<xsd:element name="comment" type="xsd:string"/>

<xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType>

cont. on next slide …

Page 15: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #15

XML SchemaThe PO Schema po.xsd (3 of 5)

… cont. from previous slide

<xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN” fixed="US"/> </xsd:complexType>

cont. on next slide …

Page 16: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #16

XML SchemaThe PO Schema po.xsd (4 of 5)

… cont. from previous slide

<xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" minOccurs="0” maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="USPrice" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="partNum" type="SKU" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType>

cont. on next slide …

Page 17: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #17

XML SchemaThe PO Schema po.xsd (5 of 5)

… cont. from previous slide

<!-- Stock Keeping Unit, a code for identifying products --> <xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType>

</xsd:schema>

Page 18: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #18

XML SchemaThe xsd: Namespace Prefix

• Elements and built-in simple types (e.g. xsd:string) has a prefix (xsd:) which is associated with the XML Schema namespace through the declaration: xmlns:xsd=http://www.w3.org/2001/XMLSchemain the schema element.

• By convention xsd: is used, but any namespace prefix is legal.

• The purpose of the prefix is to identify the elements and simple types as belonging to the vocabulary of the XML Schema language rather than the vocabulary of the schema author.

Page 19: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #19

XML SchemaOccurrence Constraints

• Elements: minOccurs (default is 1) and maxOccurs (default is 1)

• Attributes: use, with legal values “required”, “optional” and “prohibited”.

• The default attribute:“Default attribute values apply when attributes are missing, and default element values apply when elements are empty.”

• The fixed attribute ensures that elements orattributes are set to particular values.

Page 20: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #20

XML SchemaGlobal Elements & Attributes

• Global elements, and global attributes, are created by declarations that appear as children of the xsd:schema element.

• Once declared, a global element or a global attribute can be referenced in one or more declarations using the ref attribute.

• The declaration of a global element also enables the element to appear at the top-level of an instance document.

• Global declarations cannot contain references.• Cardinality constraints (minOccur/maxOccur/use)

cannot be placed on global declarations (but on local declarations that reference global declarations).

Page 21: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #21

XML Schema44 Built-In Simple Types

String, normalizedString, token, byte,

unsignedByte, base64Binary, hexBinary, integer,

positiveInteger, negativeInteger,

nonNegativeInteger, nonPositiveInteger, int,

unsignedInt, long, unsignedLong, short,

unsignedShort, decimal, float, double, boolean,

time, dateTime, duration, date, gMonth, gYear,

gYearMonth, gDay, gMonthDay, Name, QName,

NCName, anyURI, language, ID, IDREF, IDREFS,

ENTITY, ENTITIES, NOTATION, NMTOKEN,

NMTOKENS

Page 22: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #22

XML SchemaDefining New Simple Types

• New simple types are defined by deriving them from existing simple types.

• We use the restriction element and a ”facet” to constrain the range of values.

<xsd:simpleType name="myInteger"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="10000"/> <xsd:maxInclusive value="99999"/> </xsd:restriction></xsd:simpleType>

Page 23: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #23

XML vocabularies, standards and technologies

• XML Linking Language (Xlink)

• XML Base

• XML Pointer Language (Xpointer)

• XML1 Path Language (Xpath)1)Xpath don’t Use XML Syntax

Page 24: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #24

XlinkXML Linking Language

• Defines linking elements which can be inserted into XML documents in order to create and describe links between resources.

• Provides a framework for creating simple unidirectional (i.e. HTML-like) links more complex (extended) linking

structures.

Page 25: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #25

XlinkFeatures

• Xlink allows XML documents to: Assert linking relationships among more than

two resources. Associate metadata with a link. Express links that reside in a location separate

from the linked resources.

• Structure: Six linking element types Ten global attributes

Page 26: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #26

XlinkAttribute Usage

simple extended locator arc resource title

type R R R R R R

href O R

role O O O O

arcrole O O

title O O O O O

show O O

actuate O O

label O O

from O

to O

R: RequiredO: OptionalOther: No significance

attr.elements

Page 27: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #27

XlinkTwo kind of links

• Simple link: Outbound link with exactly two

participating resources (i.e. like HTML links).

No special internal structure.• Extended link:

Full Xlink functionality (including inbound and third-party arcs, and arbitrary numbers of participating resources).

Complex structure.

Page 28: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #28

XlinkElement type relationship

• Some Xlink element types have special meanings when they appear as direct children of other Xlink element types: simple, resource, title: no special

children extended: locator, arc, resource, title locator: title arc: title

Page 29: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #29

XlinkExample

Using global attributes to describe a link to a list of students:<my:crossReference xmlns:my="http://private.org/" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="students.xml" xlink:role="http://private.org/studentlist" xlink:title="Student List" xlink:show=”replace" xlink:actuate="onRequest">Current List of Students</my:crossReference>

Page 30: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #30

XML Base

• The attribute “xml:base” can be inserted in XML documents to specify a base URI other than the base URI of the document.

• Allows authors to explicitly specify base URIs for the purpose of resolving relative URIs in links to external images, applets, form-processing programs, style sheets, etc.

• Together, Xlink and XML Base bring the functionality necessary for rich XML applications that spread across multiple documents.

Page 31: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #31

Xlink/XML BaseExample:

<?xml version="1.0"?><doc xml:base="http://test.org/cold/" xmlns:xlink="http://www.w3.org/1999/xlink"><p>Please look at <link xlink:type="simple" xlink:href=”lnk1.xml">this</link> hyperlink.</p><list xml:base="/warm/"> <item> <link xlink:type="simple" xlink:href=”lnk.xml">that</link> </item></list></doc>

Link this resolves to the URI: http://test.org/cold/lnk1.xmlLink that resolves to the URI: http://test.org/warm/lnk2.xml

Page 32: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #32

XpointerXML Pointer Language

• Supports pointing (i.e. addressing) into the internal structures of XML documents

• Expresses fragment identifiers for any URI reference that locates resources whose media type is: text/xml application/xml text/xml-external-parsed-entity application/xml-external-parsed-entity

• XPointer gives 3 ways to specify fragment identifiers: two shorthand notations (using XML) one full notation (based on Xpath)

Page 33: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #33

XpointerThe problems with HTML fragments

• Pointing using HTML fragment ids:<A href="document.html#f_id">…</A> HTML pointers must link to the entire

target document. You cannot retrieve parts of a large HTML document.

HTML anchor points must be declared, but target document might be read-only.

• Xpointer addresses both these issues

Page 34: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #34

XpointerXPointer in an URI

• XPointers are referenced in a similar way to HTML pointers:http://test.org/test.xml#xpointer("p")

where p is the element in test.xml with an attribute of type ID with value “p”.

• If you only want the specified fragment to be rendered, use ‘|’ instead of ‘#’:http://test.org/test.xml|xpointer("p")

Page 35: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #35

XpointerSpecifying Fragment Identifiers

1. Bare Name Element identified by unique id attribute

Full: http://…/t.xml#xpointer(id("p"))Alt.: http://…/t.xml#p

2. Child Sequence Fragment Identification Element identified by child path from root

nodeE.g.: http://…/t.xml#1/14/2

3. Xpointer extension to Xpath notation (Covered under Xpath)

Page 36: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #36

XpathXML Path Language

• Provides common syntax and semantics for functionality shared between XSL Transformations (XSLT) and Xpointer Used to select nodes in the document that are

going to be used in the transformation process. Provides extended syntax for Xpointer

• The primary purpose of XPath is to address fragments of an XML document. Uses a compact, non-XML syntax to facilitate use

of XPath within URIs and XML attribute values.

Page 37: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #37

XpathLocation steps

• An XPath expression contains one or more location steps, separated by slashes.

• They provide a way to select nodes from an XML document.

• They operate with respect to the context node (current node in the XML document as the location step is being evaluated, or root node if not specified).

• A set of location nodes are constructed from the axis, the node test and zero or more predicates in square brackets, i.e.:axis::node-test[predicate]*

Page 38: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #38

XpathOrdering of elements

• Xpath permits ordering documents components in a tree along 13 different “axis”.

• Tree is traversed in the order elements nest in each other.

• Attribute nodes and namespace nodes occur before child elements.

• An Xpath address is a “path” because it specifies node to start with, the direction (axis), and the predicate to match to compute a node-set that are selected.

Page 39: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #39

XpathAxis=Child: Tree is traversed in document order

1. first2. second3. third4. sixth5. fourth6. seventh7. fifth8. eighth9. ninth

<first> <second> <third> <fourth> <fifth> </fifth> </fourth> </third> <sixth> <seventh <eighth> <ninth> </ninth> </eighth> </seventh> </sixth> </second></first>

first

second

sixththird

fourth

fifth

seventh

eighth

ninth

Page 40: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #40

XpathSeven types of nodes

1. Root nodesDocument root. There can only be one root.

2. Element nodesEach element make up an element node.

3. Attribute nodes (shorthand=@)Associated with context node (current element).

4. Text nodesMaximum character data goes into a text node.

5. Namespace nodesEvery element has an associated set of namespace nodes, one for each namespace prefix used in the element (including xml:).

6. Processing instruction nodesEvery PI generate a PI node.

7. Comment nodesEvery comment generate a comment node

Page 41: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #41

XpathLanaguage elements

• Data types: number (DP IEEE 754, NaN) string (any string, including empty string) boolean (true or false) node set (a set of nodes, selected through Xpath)

• Patterns operators: | alternatives / indicates absolute path, or parent // any ancestor * selects all elements located by preceeding path @ selects attributes

Page 42: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #42

XpathUsing pattern operators, examples

• chapter|appendixany chapter element or any appendix element

• olist/itemany item element that has an olist parent

• appendix//paragraphany paragraph element that has an appendix ancestor

• @classany class attribute

• div[@class="short"]//pany p element that has an div ancestor with of class s

• @*any attribute

Page 43: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #43

XpathAxis

• The axis partitions the document based on the context node.

• They defines a starting region to apply the node test and zero or more predicates to match when evaluating the expression.

• Most axis have familar “data-tree” names.

Page 44: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #44

XpathUsing axes

• ancestor• ancestor-or-self• attribute (@)• child (default axis)• decendant• descendant-or-self

(//)

• following• following-sibling• namespace• parent (..)• preceeding• preceeding-

sibling• self (.)

Page 45: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #45

XpathNode tests using Xpath

• child::paraSelects para children of the context node.

• child::Book[position()<=3]The first three <Book> child elements of the context node.

• parent::*Selects the parent of the context node.

• ..The same as parent::*.

• name(..)Select the name of the parent of the conext node

Page 46: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #46

XpathXpath used in Xpointer

• An Xpointer fragment identifier may be defined as a sequence of location sets separated by a forward slash: catalog.xml#/child::Book[position()=1]/child::RecSubjCategories/child::Category[position()=2]

• Or more compact (child axis is default): catalog.xml#/Book[1]/RecSubjCategories/Category[2]

Page 47: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #47

Tilbake til metadata

• Metadata is “data about data”. Real life examples of metadata include such things as a library catalogue card (the “data” on the card describes the data contained in the books in the library) or a TV guide (the “data” in it describes the data in the programmes about to be broadcast).

• On the World Wide Web, webmasters may embed metadata in their web pages. This is usually using a schema where the Dublin Core model for metadata (Weibel et al 1998) are expressed by means of HTML META-tags (Kunze 1999).

Page 48: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #48

More about metadata

• Metadata is data that describes and qualifies other data.

• Typical examples of metdata is: important properties of the data (e.g. the name of the

creator, and the year of publication),

data that is used to locate the data (e.g. the Dewey-code for a library book, and the time and channel for a television program),

data that is helpful when searching for data (e.g. a free-text description or a summary of the data, or a list of searchable subject keywords appropriate for the data).

Page 49: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #49

Metadata andresource discovery

• There is currently a lot of interest in using metadata to improve the quality of Interner resource discovery.

Most of this activity is rooted in the metadata activity carried by the World Wide Web Consortium’s (W3C) Technology and Society Domain, and the Dublin Core Workshop (DC) series.

W3C’s work has resulted in the defintion of a syntax for expressing metadata which is simple to process by machine, named the Resource Description Framework (RDF). RDF is an application of the Extensible Markup Language (XML). The DC activity has defined a core element set of bibliographic categories that is intended to be used to describe electronic resources.

Another metadata activity, also rooted in XML is XML Topic Maps (XTM).

Page 50: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #50

Binding data to metadata

Metadata Data Metadata Data

Metadata Data

Metadata Data

The library The internet

Metadata Data

Page 51: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #51

Topic Maps:motivation

Up until now there has been no equivalent of the traditional back-of-book index in the world of electronic information. True enough, people have marked up keywords in their word processing documents and used these to generate indexes “automatically”, but the resulting indexes have remained firmly within the paradigm of single documents destined to be published on paper. The world of electronic information is quite different, as the World Wide Web has taught us. Here the distinction between individual documents vanishes and the requirement is for indexes to span multiple documents, and in some cases, to cover vast pools of information. In this situation, old-fashioned indexing techniques are pitifully inadequate.

— Steve Pepper: The TAO of Topic Mapshttp://www.gca.org/papers/xmleurope2000/papers/s11-01.html

Page 52: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #52

XML Topic Maps (XTM)

• Topic Maps (TM) er definert av ISO: [ISO13250]

• XTM er TM realisert i form av en XML applikasjon

• Opprinnelse i leksikonverden, verktøy for å skape

ontologier topics

associations

occurences

• http://www.topicmaps.org/

Page 53: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #53

Topic Maps:overview

• Type hierarchies made simple Super-subclassing of types, type-

instance relations• Creaton of applications based upon

inference rules Deducing implicit knowledge

• Permits automated consistency checking Rule-based constraints

Page 54: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #54

Topic Maps:basic concepts

• A topic is a resource within the computer that stands in for (or “reifies”) some real-world subject. Examples of such subjects might be the play Hamlet, the

playwright William Shakespeare, or the “authorship” relationship.

• Topics can have names. They can also have occurrences. Finally, topics can participate in relationships, called associations, in which they play roles as members.Thus, topics have three kinds of characteristics: names occurrences roles played as members of associations.

Page 55: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #55

XML Topic Maps:a word on notation

Note: For brevity, (particularly in examples), we sometimes only specify URIs with a fragment identifier starting with a sharp(e.g. #play). In such cases, it is assumed that these fragment URIs refer to a <topic> element elsewhere in the same topic map with an id attribute value that matches the fragment identifier.

Example:

<topic id="hamlet">...</topic>...<topicRef xlink:href="#hamlet"/>

Page 56: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #56

XML Topic Maps:names and occurences

This identifies the topic with base name “Hamlet, Prince of Denmark”. It is an instance of a play, it has an occurence on the Internet, and this particular instance in plain text format.

<?xml version="1.0"?><topic id="hamlet"> <instanceOf><topicRef xlink:href="#play"/></instanceOf> <baseName> <baseNameString>Hamlet, Prince of Denmark</baseNameString> </baseName> <occurrence> <instanceOf><topicRef xlink:href="#plain-text-format"/></instanceOf> <resourceRef xlink:href="ftp://gutenberg.org/ws/hamlet.txt"/> </occurrence></topic>

Page 57: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #57

XML Topic Maps:roles played as members of associations

An association establishes a relationship between two or more topics.

Here we establish the specific association written-by between the topics “shakespeare” and “hamlet”, saying that the role played by the author named shakespeare is associated with the role played by the work named hamlet.

<association> <instanceOf><topicRef xlink:href="#written-by"/></instanceOf> <member> <roleSpec><topicRef xlink:href="#author"/></roleSpec> <topicRef xlink:href="#shakespeare"/> </member> <member> <roleSpec><topicRef xlink:href="#work"/></roleSpec> <topicRef xlink:href="#hamlet"/> </member></association>

Page 58: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #58

Topic Maps:more about TMs

The central idea behind topic maps is that the world can be seen as a set of addressable information resources. A resource may stand alone, or it may be be located inside of some larger information resource.

Note that one body of material may be described by many different topic maps. Different topic maps may define topics for the same subject. Topic maps dealing with the same subject may be merged.

There is no limit to the kinds of roles occurences or the kind of roles played as members of associations we can record. Much used roles are, lived-in and example-of, but we may invent the roles we need.

Page 59: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #59

Topic Maps:putting it all together

1. a set of topics, each of which serves as an electronic surro-gate for (reifies) some subject, and each of which has a name

2. a set of occurrences (i.e. addressable information resources)

3. a set of roles played by the occurrences of the subjects those topics reify, (an occurrence-role may simly be to be a published instance of, or something more complex, e.g. synopsis-for, discussed-in, mentioned-in, depicted-in)

4. a set of associations between topics, (e.g. example-of, written-by, lived-in)

The term topic map is used to denote any collection of the following four objects:

Page 60: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #60

The Dublin Core Element Set v1.115 explicit metadata elements

Fields Description

Title The name given to the resource, usually by the Creator or Publisher.

Subject A comma-separated list of keywords describing subject.

Description Free text description of the content of the resource.

Source A resource identifier pointing to a resource from which the present resource is derived.

Relation A resource identifier pointing to a second, related resource (e.g. URI).

Language RFC 1766 tag identifying the language of the intellectual content of the resource.

Coverage The spatial or temporal characteristics of the intellectual content of the resource.

Creator Author (can be a person, an organisation or a service).

Publisher The entity responsible for making the resource available in its present form.

Contributor An entity not listed as Creator who has made intellectual contributions to the resource.

Rights A rights management statement, or an resource identifier that links to such a statement.

Date ISO8601-type date associated with the creation or availability of the resource.

Type The nature or genre of the content of the resource.

Format The digital or physical manifestation of the resource (e.g. Mime media types)

Identifier A string or number used to uniquely identify the resource (e.g. URI or ISBN)

Page 61: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #61

Dublin Core uttrykt somHTML 4.0 metatagger

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"><html><head><title>Gisle Hannemyr's Home Page</title><link rel="schema.DC" href="http://purl.org/DC/elements/1.1/"><meta name="DC.Title" content="Gisle Hannemyr's Home Page"><meta name="DC.Creator" content="Hannemyr, Gisle"><meta name="DC.Subject" content="Gisle Hannemyr; home page"><meta name="DC.Description" content="Gisle Hannemyr's private website. The site contains selected essays and open source software."><meta name="DC.Type" CONTENT="Text"><meta name="DC.Format" CONTENT="text/html"><meta name="DC.Identifier" content="http://hannemyr.com/index.html"><meta name="DC.Language" content="en"></head>

Page 62: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #62

RDF

• XML applikasjon for å kode en svært enkel datamodell som beskriver ressurser.

• Tett knyttet til Dublin Core• Kommer i fra bibliotekverden• http://www.w3.org/RDF/

Page 63: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #63

RDF: Basic Data Model

• The basic premise in this model is that an identifiable, addressable “resource” (e.g. a web page) may be described by a set of “properties” (e.g. author, programming language, etc.), each of which has an associated value (Bob Smith, etc). Below is the three elements of the model (Resource, Property, Value) is illustrated. Also shown is an instance, where the model is used to express the statement: “The resource named ‘http://a.com/b.html’ is authored by an entity named ‘Bob Smith’”.

ResourceProperty

Value

http://a.com/b.htmlAuthor

Bob Smith

Page 64: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #64

Simple RDF example

In RDF, the resource described is identified by a Description element. Each child element represents a property of the resource. Property values may (and usually do) come from namespaces outside the RDF namespace.

<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#"> <rdf:Description about="ftp://gutenberg.org/ws/hamlet.txt">

<author>William Shakespeare</author>

<play>Hamlet, Prince of Denmark</play> </rdf:Description>

</rdf:RDF>

Page 65: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #65

Hiding RDF data from browsers

To embed descriptions in web pages, RDF offers an abbreviated syntax (given the values in the Description element as attributes insted of character data). This stops the attributes being rendered visible by browsers.

<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#"> <rdf:Description rdf:about=""

author = "William Shakespeare"

play = "Hamlet, Prince of Denmark" />

</rdf:RDF>

Page 66: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #66

From HTML metatags to RDF

DC-RDF - Creator, …

“Bob Smith is the Creator of the resource identified by http://www.ifi.no/a.html (where creator is as defined within the ‘DC’ namespace)”

http://www.ifi.no/a.htmldc:Creator

Smith, Bob

<META NAME="DC.Creator" CONTENT="Smith, Bob"><META NAME="DC.Creator" CONTENT="Smith, Bob">

Page 67: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #67

From HTML metatags to RDF

Language, Type, Title

<META NAME="DC.Language" CONTENT="no"><META NAME="DC.Type" CONTENT="Text"><META NAME="DC.Date" CONTENT="2002-04-21"><META NAME="DC.Title" CONTENT="EFN Hjemmeside">

<META NAME="DC.Language" CONTENT="no"><META NAME="DC.Type" CONTENT="Text"><META NAME="DC.Date" CONTENT="2002-04-21"><META NAME="DC.Title" CONTENT="EFN Hjemmeside">

http://www.efn.no/

dc:Language

dc:Type

dc:Title

“no”

“Text”

“EFN Hjemmeside”

“2002-04-21”dc:Date

Page 68: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #68

xmlns: Both standardized (here: rdf, dc) and private (here: fast) namespaces can be used.

<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:fast="http://www.ifi.uio.no/~gisle/fast/meta.html"> <rdf:Description xml:lang="en" rdf:about="ftp://ftp.server.edu/bobbas12.zip"> <dc:Creator>Bob Smith</dc:Creator> <fast:proglanguage>Java</fast:proglanguage> </rdf:Description></rdf:RDF>

Page 69: Digital Documents Gisle Hannemyr Autumn 2002 XML vocabularies and technologies XML DOM, Xschema Xlink, XML Base, Xpointer, Xpath XTM and RDF

10. april 2002 IN-DIW Side #69

XML/RDF:Object Oriented Extensions

• Inheritance (element) Inheritance refers to the ability to create subclasses which “inherit” the

properties from superclasses, but extend or refine those properties by specifying its own extension properties with added semantics (e.g. we may want to refine the meaning of the “date” element to mean “modification date”.

• Polymorphism (value)

Polymorphism refer to the ability to override a property with one with the same signature (i.e. name) but with altered semantics. For example, the value of property “identifier” may be processed differently depending upon genre (e.g. file, ISBN, URL or URN). Other typical uses of polymorphism is to cater for different classification schemes (e.g. ddc, LCSH or MESH) used for the Subject property. In the FAST profile, we make use of polymorphism to permit alternate currencies as values for the Price property. In that case, one of the standard ISO 4217 three letter abbreviations for the worlds currencies (e.g. USD, GBP, DEM, CHF, etc.) is used.