The Same Example in XML HTML versus XML: Similarities The Same Example in XML ... XML Schema 4. Namespaces 5. ... Detailed Description of XML 3. Structuring a) DTDs b) XML Schema 4

  • View
    216

  • Download
    2

Embed Size (px)

Text of The Same Example in XML HTML versus XML: Similarities The Same Example in XML ... XML Schema 4....

  • Chapter 2 A Semantic Web Primer1

    Chapter 2

    Structured Web Documents in XML

    Grigoris Antoniou

    Frank van Harmelen

    Chapter 2 A Semantic Web Primer2

    An HTML Example

    Nonmonotonic Reasoning: Context-

    Dependent Reasoning

    by V. Marek and

    M. Truszczynski

    Springer 1993

    ISBN 0387976892

    Chapter 2 A Semantic Web Primer3

    The Same Example in XML

    Nonmonotonic Reasoning: Context-Dependent Reasoning

    V. Marek

    M. Truszczynski

    Springer

    1993

    0387976892

    Chapter 2 A Semantic Web Primer4

    HTML versus XML: Similarities

    z Both use tags (e.g. and )

    z Tags may be nested (tags within tags)

    z Human users can read and interpret both

    HTML and XML representations quite easily

    But how about machines?

  • Chapter 2 A Semantic Web Primer5

    Problems with Automated Interpretation of HTML Documents

    An intelligent agent trying to retrieve the namesof the authors of the book

    z Authors names could appear immediately

    after the title

    z or immediately after the word by

    z Are there two authors?

    z Or just one, called V. Marek and M.

    Truszczynski?

    Chapter 2 A Semantic Web Primer6

    HTML vs XML: Structural Information

    z HTML documents do not contain structural

    information: pieces of the document and their

    relationships.

    z XML more easily accessible to machines because

    Every piece of information is described.

    Relations are also defined through the nesting structure.

    E.g., the tags appear within the tags, so

    they describe properties of the particular book.

    Chapter 2 A Semantic Web Primer7

    HTML vs XML: Structural Information (2)

    z A machine processing the XML document

    would be able to deduce that

    the author element refers to the enclosing book

    element

    rather than by proximity considerations

    z XML allows the definition of constraints on

    values E.g. a year must be a number of four digits

    Chapter 2 A Semantic Web Primer8

    HTML vs XML: Formatting

    z The HTML representation provides more

    than the XML representation:

    The formatting of the document is also described

    z he main use of an HTML document is to

    display information: it must define formatting

    z XML: separation of content from display

    same information can be displayed in different

    ways

  • Chapter 2 A Semantic Web Primer9

    HTML vs XML: Another Example

    z In HTML

    Relationship matter-energy

    E = M c2

    z In XML

    Relationship matter

    energy

    E

    M c2

    Chapter 2 A Semantic Web Primer10

    HTML vs XML: Different Use of Tags

    z In both HTML docs same tags

    z In XML completely different

    z HTML tags define display: color, lists

    z XML tags not fixed: user definable tags

    z XML meta markup language: language for defining markup languages

    Chapter 2 A Semantic Web Primer11

    XML Vocabularies

    z Web applications must agree on common

    vocabularies to communicate and collaborate

    z Communities and business sectors are

    defining their specialized vocabularies

    mathematics (MathML)

    bioinformatics (BSML)

    human resources (HRML)

    Chapter 2 A Semantic Web Primer12

    Lecture Outline

    1. Introduction

    2. Detailed Description of XML

    3. Structuringa) DTDs

    b) XML Schema

    4. Namespaces

    5. Accessing, querying XML documents: XPath

    6. Transformations: XSLT

  • Chapter 2 A Semantic Web Primer13

    The XML Language

    An XML document consists of

    z a prolog

    z a number of elements

    z an optional epilog (not discussed)

    Chapter 2 A Semantic Web Primer14

    Prolog of an XML Document

    The prolog consists of

    z an XML declaration and

    z an optional reference to external structuring documents

    Chapter 2 A Semantic Web Primer15

    XML Elements

    z The things the XML document talks about

    E.g. books, authors, publishers

    z An element consists of:

    an opening tag

    the content

    a closing tag

    David Billington

    Chapter 2 A Semantic Web Primer16

    XML Elements (2)

    z Tag names can be chosen almost freely.

    z The first character must be a letter, an

    underscore, or a colon

    z No name may begin with the string xml in

    any combination of cases

    E.g. Xml, xML

  • Chapter 2 A Semantic Web Primer17

    Content of XML Elements

    z Content may be text, or other elements, or nothing

    David Billington

    +61 7 3875 507

    z If there is no content, then the element is called empty; it is abbreviated as follows:

    for

    Chapter 2 A Semantic Web Primer18

    XML Attributes

    z An empty element is not necessarily

    meaningless

    It may have some properties in terms of attributes

    z An attribute is a name-value pair inside the

    opening tag of an element

    Chapter 2 A Semantic Web Primer19

    XML Attributes: An Example

    Chapter 2 A Semantic Web Primer20

    The Same Example without Attributes

    23456

    John Smith

    October 15, 2002

    a528

    1

    c817

    3

  • Chapter 2 A Semantic Web Primer21

    XML Elements vs Attributes

    z Attributes can be replaced by elements

    z When to use elements and when attributes is

    a matter of taste

    z But attributes cannot be nested

    Chapter 2 A Semantic Web Primer22

    Further Components of XML Docs

    z Comments

    A piece of text that is to be ignored by parser

    z Processing Instructions (PIs)

    Define procedural attachments

    Chapter 2 A Semantic Web Primer23

    Well-Formed XML Documents

    z Syntactically correct documents

    z Some syntactic rules: Only one outermost element (called root element)

    Each element contains an opening and a corresponding closing tag

    Tags may not overlap. The following is forbidden:z Lee Hong

    Attributes within an element have unique names

    Element and tag names must be permissible

    Chapter 2 A Semantic Web Primer24

    The Tree Model of XML Documents: An Example

    Where is your draft?

    Grigoris, where is the draft of the paper you promised me

    last week?

  • Chapter 2 A Semantic Web Primer25

    The Tree Model of XML Documents: An Example (2)

    Chapter 2 A Semantic Web Primer26

    The Tree Model of XML Docs

    z The tree representation of an XML document

    is an ordered labeled tree:

    There is exactly one root

    There are no cycles

    Each non-root node has exactly one parent

    Each node has a label.

    The order of elements is important

    but the order of attributes is not important

    Chapter 2 A Semantic Web Primer27

    Lecture Outline

    1. Introduction

    2. Detailed Description of XML

    3. Structuringa) DTDs

    b) XML Schema

    4. Namespaces

    5. Accessing, querying XML documents: XPath

    6. Transformations: XSLT

    Chapter 2 A Semantic Web Primer28

    Structuring XML Documents

    z Define all the element and attribute names

    that may be used

    z Define the structure

    what values an attribute may take

    which elements may or must occur within other

    elements, etc.

    z If such structuring information exists, the

    document can be validated

  • Chapter 2 A Semantic Web Primer29

    Structuring XML Dcuments (2)

    z An XML document is valid if

    it is well-formed

    respects the structuring information it uses

    z There are two ways of defining the structure

    of XML documents:

    DTDs (the older and more restricted way)

    XML Schema (offers extended possibilities)

    Chapter 2 A Semantic Web Primer30

    DTD: Element Type Definition

    David Billington

    +61 7 3875 507

    DTD for above element (and all lecturer elements):

    Chapter 2 A Semantic Web Primer31

    The Meaning of the DTD

    z The element types lecturer, name, and phone may

    be used in the document

    z A lecturer element contains a name element and a

    phone element, in that order (sequence)

    z A name element and a phone element may have

    any content

    z In DTDs, #PCDATA is the only atomic type for

    elements

    Chapter 2 A Semantic Web Primer32

    DTD: Disjunction in Element Type Definitions

    z We express that a lecturer element contains

    either a name element or a phone element as

    follows:

    z A lecturer element contains a name element

    and a phone element in any order.

  • Chapter 2 A Semantic Web Primer33

    Example of an XML Element

    Chapter 2 A Semantic Web Primer34

    The Corresponding DTD

    customer CDATA #REQUIRED

    date CDATA #REQUIRED>

    quantity CDATA #REQUIRED

    comments CDATA #IMPLIED>

    Chapter 2 A Semantic Web Primer35

    Comments on the DTD

    z The item element type is defined to be empty

    z + (after item) is a cardinality operator:

    ?: appears zero times or once

    *: appears zero or more times

    +: appears one or more times

    No cardinality operator means exactly once

    Chapter 2 A Semantic Web Primer36

    Comments on the DTD (2)

    z In addition to defining elements, we define

    attributes

    z This is done in an attribute list containing:

    Name of the element type to which the list applies

    A list of triplets of attribute name, attribute type,

    and value type

    z Attribute name: A name that may be used in

    an XML docu