65
1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

Embed Size (px)

Citation preview

Page 1: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

1

Spring 2000

Christophides Vassilis

The eXtensible Markup Language:An Introduction to XML

Documents & Databases

Page 2: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

2

Spring 2000

Christophides Vassilis

Preliminary Issues

Page 3: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

3

Spring 2000

Christophides Vassilis

What is a document?

Content: the components (words, images etc). which make up a document

Structure: the organization and inter-relationship of the components

Presentation: how a document looks and what processes are applied to it

Page 4: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

4

Spring 2000

Christophides Vassilis

Separating these things means...

The content can be re-used for printing for querying for exchanging

The structure can be formally validated

The presentation can be customized for

different media different audiences

… in short, the information can be uncoupled from its processing

Page 5: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

5

Spring 2000

Christophides Vassilis

Documents vs Databases

Document worldDocument world

plenty of small documents usually static

implicit structure section, paragraph, toc,

tagging human friendly

content form/layout, annotation

paradigms “Save as”, WYSIWYG

metadata author name, date,

subject

Database worldDatabase world

a few large databases usually dynamic

explicit structure types

records machine friendly

content data, methods

paradigms Data Independence, Transaction

Management, Query Languages metadata

schema description

Page 6: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

6

Spring 2000

Christophides Vassilis

DBMS ANSI/SPARC Architecture

PHYSICALSCHEMA

LOGICALSCHEMA

INTERNALLEVEL

CONCEPTUALLEVEL

EXTERNALLEVEL

VIEW 1 VIEW 2 VIEW 3

INTEGRATION

Page 7: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

7

Spring 2000

Christophides Vassilis

What to do with them

DocumentsDocuments

editing

spell-checking

counting words

retrieving (IR)

printing

DatabaseDatabase

updating

cleaning

querying

composing/transforming

Page 8: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

8

Spring 2000

Christophides Vassilis

Query Languages

Document RetrievalDocument Retrieval

Claude Monet and San Diego Museum of ArtDatabase QueryingDatabase Querying

select pfrom Artists a, a.artwork p

where a.first = “Claude”

and a.last = “Monet”

and p.located =

“San Diego Museum of Art”

Page 9: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

9

Spring 2000

Christophides Vassilis

The Long Road of Document Standards

Rick Jelliffe 1999

Page 10: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

10

Spring 2000

Christophides Vassilis

<B>MONET, Claude<B><BR>Haystacks at Chailly at Sunrise<BR>1865<BR>Oil on canvas<BR>30 x 60 cm (11 7/8 x 23 3/4 in.)<BR>San Diego Museum of Art <BR><P><IMG SRC=“http://192.41.13.240/artchive/ m/monet/hayricks.jpg”>

What’s Wrong with HTML

If written properly, normal HTML may reflect document presentation, but it cannot adequately represent the semantics & structure of data

Artist Name

Date

Artifact Title

Dimensions

Material

Museum

Image Reference

Page 11: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

11

Spring 2000

Christophides Vassilis

HTML Document Presentation vs. …

Page 12: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

12

Spring 2000

Christophides Vassilis

… XML Data Representation

A possible XML markup of the same information will retain the structure (and the semantics) of the various data objects

<ARTIST> <NAME><FIRST>Claude</FIRST><LAST>Monet</LAST></NAME> <ARTWORK> <ARTIFACT> <TITLE>Haystacks at Chailly at Sunrise</TITLE> <DATE>1865</DATE> <MATERIAL>Oil on canvas</MATERIAL> <DIM Metric=‘cm’> <HEIGHT>30</HEIGHT><WIDTH>60</WIDTH></DIM> <DIM Metric=‘in’> <HEIGHT>11 7/8</HEIGHT><WIDTH>23 3/4</WIDTH></DIM> <LOCATION>San Diego Museum of Art</LOCATION> <IMAGE File=‘http://192.41.13.240/artchive/m/monet/hayricks.jpg’/> </ARTIFACT> </ARTWORK></ARTIST>

Page 13: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

13

Spring 2000

Christophides Vassilis

XML can be Published as normal Web Data

Page 14: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

14

Spring 2000

Christophides Vassilis

What is XML?

Markup Meta-Language for domain or application specific structured documentation

Mathematical, chemical, musical, publishing, etc. Developed by the SGML Editorial Board formed under the auspices of

the World Wide Web Consortium (W3C) Founded in 1996 by Jon Bosac (Sun) and various Web/SGML vendors:

Textuality, Netscape, Microsoft, INSO, HP, Highland, NCSA, ArbortText, GRIF, SoftQuand

Subset of SGML optimized for use in the Inter/Intranet SGML is proving difficult to implement for Web/Intranet applications SGML has been hard to cost-justify to management

Opens the way for a new generation of Web applications Improve precision during searching and retrieval Enable multiple usage of the same data Facilitate distributed processing with more versatile ways to manipulate data

Page 15: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

15

Spring 2000

Christophides Vassilis

Why XML?

XML provides key features for a new generation of Web applications: Structuring: unlike HTML it preserves the structure of the data Extensibility: not a fixed format like HTML but user-oriented tagging Validation: provides the means to consuming applications to check

data for structural validity on importation Presentation Late Binding: describes data, not visual presentation Human Readable: similar to HTML Interchange: good for transmission of data from server to browser,

and from application to application, or machine to machine Open standard: non proprietary format

XML becomes an integral part of the Web infrastructure Microsoft Explorer (V5.0) already offers XML browsing Ongoing XML implementation by Netscape Various XML middleware and manipulation tools

Page 16: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

16

Spring 2000

Christophides Vassilis

The XML Language Family 

XML (Extensible Markup Language) A subset of SGML (ISO 8879)

designed for easy implementation XLink (Extensible Linking Language)

A set of standard hypertext mechanisms based on HyTime (ISO/IEC 10744) and the Text Encoding Initiative (TEI)

XSL (Extensible Stylesheet Language) A standard stylesheet language for

structured information derived from DSSSL (ISO/IEC 10179) and key CSS concepts

0

50

100

150

200

250

300

350

HTML 0.0 TEI Lite 1.6 IBM ID Doc4.0

Tags

Attribute

A

Page 17: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

17

Spring 2000

Christophides Vassilis

Interrelationships Among the Various W3C Efforts

Page 18: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

18

Spring 2000

Christophides Vassilis

XML Syntax and Semantics

Page 19: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

19

Spring 2000

Christophides Vassilis

An Example of XML Markup

<ARTIST> <NAME> <FIRST>Claude</FIRST> <LAST>Monet</LAST> </NAME> <ARTWORK> <ARTIFACT> <TITLE>Haystacks at Chailly at Sunrise</TITLE> <DATE>1865</DATE> <MATERIAL>Oil on canvas</MATERIAL> <DIM Metric=‘cm’> <HEIGHT>30</HEIGHT><WIDTH>60</WIDTH></DIM> <DIM Metric=‘in’> <HEIGHT>11 7/8</HEIGHT><WIDTH>23 3/4</WIDTH></DIM> <LOCATION>San Diego Museum of Art</LOCATION> <IMAGE File=‘http://192.41.13.240/artchive/m/monet/hayricks.jpg’/> </ARTIFACT> </ARTWORK></ARTIST>

Element Name Element Content

Empty Element

Attribute ValueAttribute Name

Page 20: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

20

Spring 2000

Christophides Vassilis

The Logical Tree Structure of XML

ARTIST

NAME ARTWORK

FIRST LAST ARTIFACT

TITLE DATE

MATERIAL

DIM IMAGEDIM

LOCATION

...hayricks.jpg

Claude MONET

Haystacks 1865

Oil on canvas

San Diego Mus.

30 60 117/8

233/4

H W H W

Page 21: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

21

Spring 2000

Christophides Vassilis

XML Document Type Definitions

<!DOCTYPE artist [<!ELEMENT artist (name, born, death, artwork, nationality?, influences)><!ATTLIST artist oid ID #REQUIRED xml:lang NMTOKEN #IMPLIED><!ELEMENT name (first, last)><!ELEMENT first (#PCDATA)><!ELEMENT last (#PCDATA)> ...<!ELEMENT artwork (artifact+)><!ELEMENT artfact (title, date, material, dim*, location, image)><!ELEMENT title (#PCDATA)> ...<!ELEMENT dim (height, width)><!ATTLIST dim metric (cm| in) ‘cm’><!ELEMENT location (#PCDATA)><!ELEMENT image EMPTY><!ATTLIST image file ENTITY #REQUIRED><!ELEMENT influences (PCDATA | aref)*><!ELEMENT ref EMPTY><!ATTLIST aref xml:link CDATA #FIXED ‘simple’ href CDATA #REQUIRED><!NOTATION jpeg PUBLIC “-//local//NOTATION Gpeg Images//EN”><!ENTITY fig1 SYSTEM ‘… /monet/hayricks.jpg’ NDATA jpeg>]>

Page 22: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

22

Spring 2000

Christophides Vassilis

XML Core Markup Features

Elements: Components of the tree logical structure defined by a DTD identified in a document instance by descriptive markup, usually a

start-tag and end-tag Attributes: Characteristics associated to the elements (other than

their content and type) may be applied to one specific instance of a given element

Entities: Named fragments of information that can be stored separately from a document (or a DTD)

can be included in the document (or the DTD) one or more times by reference to their names

Page 23: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

23

Spring 2000

Christophides Vassilis

Definition of Element’s Content

EMPTY<!ELEMENT image EMPTY>

end-tag is omittedimplicit content or generated

automatically by an application

ANY<!ELEMENT object ANY>

element content can consist of a freemixture of parsed character data and

any of the elements in the DTDMIXED TEXT

<!ELEMENT title (#PCDATA)>element content can contain

characters, entity references andtags allowed by the element model

GROUP<!ELEMENT media (image|video)>

<!ELEMENT name (first,last)>element content with connectors (‘,’ ‘|’ )

and occurrence indicators (‘+’ ‘?’ ‘*’)

Mixed models must be optional repeatable OR-groups, with #PCDATA first

Page 24: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

24

Spring 2000

Christophides Vassilis

What XML can express?

Sequence « , »

<!ELEMENT name (first,last)>

Choice « | »

<!ELEMENT media (image | video)>

Option ( 1 or 0 ) « ? »

<!ELEMENT artist (…, nationality?, …)

Repetition (1 or more ) « + »

<!ELEMENT artwork (artifact+)>

Option and Repetition ( 0, 1 or more ) « * »<!ELEMENT artfact (..., dim*, ...)>

Page 25: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

25

Spring 2000

Christophides Vassilis

XML Content Models and Regular Expressions

Each element content model is defined by a regular expression Example: name, addr*, email

Each regular expression determines a corresponding finite state

automatonThis suggests a simple parsing program

Content Models should be defined by unambiguous regular expressions

name

addr

email

Page 26: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

26

Spring 2000

Christophides Vassilis

XML Regular Expressions: Another Example

Adding in the optional greet further complicates things Example: name,address*,(tel | fax)*,email*

name

address

tel

tel

fax

fax

email

email

email

Page 27: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

27

Spring 2000

Christophides Vassilis

Definition of Attribut’s Content

(value1 | value2 | .. |valuen)value enumeration

<!ATTLIST dim metric (cm | in)><dim metric = ‘in’ >

CDATAstring of valid XML character data

<!ATTLIST ref href CDATA ><ref href= ‘http: / /192.41.13.240/

artchive/m/monet.xml’>ID

symbolic identifier of an element<!ATTLIST artist oid ID>

<artist oid = Monet>IDREF(S)

reference (or list of) to element’sidentifiers

<!ATTLIST artifact creator IDREF><artifact creator = Monet>

ENTITY(IES)reference (or list of) to entity names

<!ATTLIST image file ENTITY><image file = fig1>

NMTOKEN(S)valid XML string preceded by

numbers, hyphen, period

<!ATTLIST artist xml: lang NMTOKEN><artist xml: lang = “_english”>

More types (e.g., DATE) may soon be part of the standard

Page 28: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

28

Spring 2000

Christophides Vassilis

Attribute Default Values

Value ‘vi ’ a given value from an enumeration of values

#FIXED value the value is the only possible instance for the attribute

#REQUIRED the value must be supplied

#IMPLIED the value can be optionally supplied

Page 29: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

29

Spring 2000

Christophides Vassilis

XML Entities

Entities allow the definition of short strings to stand for more complex information, which can reside inside or outside the document or its DTD

Used for substitutions of data or markup: DTD level e.g., markup declaration (Parameter entity) Document level e.g., data and markup instances (General entity)

Used for references to external data or markup sources: the content of the entity can be found using an XML system-specific

storage location (Specific entity) the content of the entity can be found by mapping a public identifier

to a system-specific storage location (Public entity)

Page 30: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

30

Spring 2000

Christophides Vassilis

XML Parameter Entities

Parameter entities are used for extensible declarations (e.g., macros) of complex content models or attributes in a DTD

<!ENTITY % style “impressionism | cubism | surrealism”> Parameter entities can be nested

<!ENTITY % bibelem2 “%bibelem; | expressionism | dada”>

but we must avoid infinite loops

<!ENTITY % bibelem “%bibelem; | expressionism | dada”> Replacement entity text can be found outside the DTD

<!ENTITY % ISOlat2 PUBLIC “ISO 8879-1986//ENTITIES Added Latin 2//EN”>

Page 31: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

31

Spring 2000

Christophides Vassilis

XML General Entities

General entities are used for substitution of textual or not textual objects (e.g., constants) occurring many times or are volatiles in the document instances

<!ENTITY xml “Extensible Markup Language”> Replacement text of general entities can contain tags, character

references or other entities

<!ENTITY www “W3C Recommendation 10-February-1998”><!ENTITY xml “<TITLE>Extensible Markup Language &www;

</TITLE>”>

but also we must avoid infinite loops The content of a general entity can be found outside the DTD and it

may have a particular format

Page 32: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

32

Spring 2000

Christophides Vassilis

XML Specific Entities

Specific entities can be viewed as “abstract storage objects” (e.g., data stream) that are mapped onto real ones by using a system-specific storage location

Sub-documents encoded in XML with a different DTD

<!ENTITY biography SYSTEM “… /monet.xml”> Textual data encoded with a particular format

<!ENTITY bibliography SYSTEM “… /monet.bib”> Non-SGML data

<!ENTITY fig1 SYSTEM “… /monet/hayricks.jpg” NDATA jpeg>

Page 33: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

33

Spring 2000

Christophides Vassilis

The Main XML Components

Page 34: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

34

Spring 2000

Christophides Vassilis

Well-Formed XML

A textual object is said to be a well-formed XML document if it meets

all the well-formedness constraints (WFCs) of the XML syntax: tags (etc.) are syntactically correctevery tag has an end-tagtags are properly nested there exists a root

By definition if a document is not well-formed, it is not XMLThis means that there is no an XML document which is not well-formed, and XML processors are not required to do anything with such documents

Page 35: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

35

Spring 2000

Christophides Vassilis

Valid XML

A well-formed document is valid only if it contains a proper DTD and if the document obeys the constraints of that DTD and therefore the XML Validity Constraints (VCs)

only declared tags are used all tag occurrences conform to specified content models

Examples: The following XML Document is well-formed but not valid

<artist> MONET, Claude </artist> The following XML Document is not even well-formed

<first>Claude</first><last>MONET</last>

Page 36: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

36

Spring 2000

Christophides Vassilis

When do we need a DTD?

At document preparation time (definitely) validation, checking, consistency

At document processing time (probably) simplifies generic/specific processing may clarify intended semantics

At document delivery time (possibly) strictly unnecessary for well-formed docs but reduces processing effort

Creation

Composition

Validation

Usage

Page 37: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

37

Spring 2000

Christophides Vassilis

Where is the behaviour of XML defined?

In a stylesheet

using XSL or CSS

Possibly embedded in a program

applet, or script, or JAVA bean

defined for that particular DTD, set

of tags, or tag

By reference to pre-existing mutual

agreement amongst user communities

aka “namespaces”

By reference to a Document Object

Model

XML XSL

Page 38: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

38

Spring 2000

Christophides Vassilis

type-checking

constants

macros

void*

void*

header file

#ifdef

standard library

namespace

validation

entity reference

entity parameter

ANY

IDREF

DTD

conditional section

key entities

namespace

Comparing XML and Programming Languages

But no type inference, polymorphism, modules, etc.

Page 39: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

39

Spring 2000

Christophides Vassilis

XML DTDs vs. Database Schemas

By database standards, DTDs are rather weak specifications Only one base type i.e., PCDATA Only two element constructors i.e., sequence and alternative No useful “abstractions” e.g., bulk types, inheritance IDREFs are untyped

You point to something, but you don’t know what! No integrity constraints e.g., child is inverse of parent No methods Tag definitions are global

Recent XML extensions impose something like a schema or type on an XML data (XML Schema)

Page 40: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

40

Spring 2000

Christophides Vassilis

XML vs. ODMG ODL: Example

class Movie( extent Movies, key title ){ attribute string title; attribute string director; relationship set<Actor> casts inverse Actor::acted_In; attribute int budget;} ;

class Actor( extent Actors, key name ){ attribute string name; relationship set<Movie> acted_In inverse Movie::casts; attribute int age; attribute set<string> directed;} ;

Page 41: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

41

Spring 2000

Christophides Vassilis

XML vs. ODMG ODL: Example

<db> <movie id=“m1”> <title>Waking Ned Divine</title> <director>Kirk Jones III</director> <cast idrefs=“a1 a3” /> <budget>100,000</budget> </movie> <movie id=“m2”> <title>Dragonheart</title> <director>Rob Cohen</director> <cast idrefs=“a2 a9 a21”/> <budget>110,000</budget> </movie> <movie id=“m3”> <title>Moondance</title> <director>Dagmar Hirtz</director> <cast idrefs=“a1 a8”/> <budget>90,000</budget> </movie>

<actor id=“a1”> <name>David Kelly</name> <acted_In idrefs=“m1 m3 m78”/> </actor> <actor id=“a2”> <name>Sean Connery</name> <acted_In idrefs=“m2 m9 m11”/> <age>68</age> </actor> <actor id=“a3”> <name>Ian Bannen</name> <acted_In idrefs=“m1 m35”/> </actor> :</db>

Page 42: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

42

Spring 2000

Christophides Vassilis

XML vs. ODMG ODL: Example

<!DOCTYPE db [ <!ELEMENT db (movie+, actor+)> <!ELEMENT movie (title, director, cast, budget)> <!ATTLIST movie id ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT director (#PCDATA)> <!ELEMENT cast EMPTY> <!ATTLIST cast idrefs IDREFS #REQUIRED> <!ELEMENT budget (#PCDATA)> <!ELEMENT actor (name, acted_In, age?, directed*)> <!ATTLIST actor id ID #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT acted_In EMPTY> <!ATTLIST acted_In idrefs IDREFS #REQUIRED> <!ELEMENT age (#PCDATA)> <!ELEMENT directed (#PCDATA)> ]>

Page 43: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

43

Spring 2000

Christophides Vassilis

Mapping Between XML and Objects

Page 44: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

44

Spring 2000

Christophides Vassilis

XML vs Relational DBMS

projects:

title budget managedBy

employees:name ssn age

<!DOCTYPE db [<!ELEMENT db (projects, employees)><!ELEMENT projects (project*)><!ELEMENT employees (employee*)><!ELEMENT project (title, budget, managedBy)><!ELEMENT employee (name, ssn, age)>

...]>

<!DOCTYPE db [<!ELEMENT db (project | employee)*><!ELEMENT project (title, budget, managedBy)><!ELEMENT employee (name, ssn, age)>

...]>

Page 45: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

45

Spring 2000

Christophides Vassilis

Recursive DTDs

<DOCTYPE genealogy [<!ELEMENT genealogy (person*)><!ELEMENT person ( name,

dateOfBirth, person, -- mother person -- father )> ...

]>

What is the problem with this?

Page 46: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

46

Spring 2000

Christophides Vassilis

Recursive DTDs cont’d.

<DOCTYPE genealogy [<!ELEMENT genealogy (person*)><!ELEMENT person ( name, dateOfBirth, person?, -- mother person? )> -- father ...

]>

What is now the problem with this?

Page 47: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

47

Spring 2000

Christophides Vassilis

Some Things are Hard to Specify

Each employee element is to contain name, age and ssn elements in some order

<!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) | …)>

Suppose there were many more fields !

Page 48: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

48

Spring 2000

Christophides Vassilis

Specifying ID and IDREF Attributes

<!DOCTYPE family [<!ELEMENT family (person)*><!ELEMENT person (name)><!ELEMENT name (#PCDATA)><!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED

children IDREFS #IMPLIED>]>

Page 49: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

49

Spring 2000

Christophides Vassilis

Some Conforming XML data

<family> <person id="jane" mother="mary" father="john"> <name> Jane Doe </name> </person> <person id="john" children="jane jack"> <name> John Doe </name> </person> <person id="mary" children="jane jack"> <name> Mary Doe </name> </person> <person id="jack" mother=”mary" father="john"> <name> Jack Doe </name> </person></family>

Page 50: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

50

Spring 2000

Christophides Vassilis

An Alternative XML DTD Specification

<!DOCTYPE family [<!ELEMENT family (person)*><!ELEMENT person (mother?, father?, children, name)><!ATTLIST person id ID #REQUIRED><!ELEMENT name (#PCDATA)><!ELEMENT mother EMPTY><!ATTLIST mother idref IDREF #REQUIRED><!ELEMENT father EMPTY><!ATTLIST father idref IDREF #REQUIRED><!ELEMENT children EMPTY><!ATTLIST children idrefs IDREFS #REQUIRED>]>

Page 51: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

51

Spring 2000

Christophides Vassilis

The Revised XML Data

<family> <person id = "jane”>

<name> Jane Doe </name> <mother idref = "mary”></mother> <father idref = "john"></father>

</person> <person id = "john”>

<name> John Doe </name><children idrefs = "jane jack"> </children>

</person> ...

</family>

Page 52: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

52

Spring 2000

Christophides Vassilis

Mapping between XML and Tables

Page 53: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

53

Spring 2000

Christophides Vassilis

Bluring the Frontiers between Data & Documents

Page 54: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

54

Spring 2000

Christophides Vassilis

Towards XML-enabled DBMS

Xml-enabled database system Store XML data/documents into the

database server Query and search valid and well-

formed XML Generate XML data from the

database server Add XML capabilities in supporting

database facilities XML has the potential to impact four

important markets Web integration Web publishing Application integration Electronic commerce

DBMS

Store XML

GenerateXML

Integrate with other facilities

Page 55: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

55

Spring 2000

Christophides Vassilis

Storing XML Data

Enhance XML storage facilities in the database: Utilities to load XML data into the

database Provide more efficient database

storage (componentized storage, compression, indexing,…)

XML export tools from the server Allow server-to-server replication

of XML data

Database

HTMLHTML

XML

Database

Page 56: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

56

Spring 2000

Christophides Vassilis

Querying and Searching XML Data

Fine-grained access to XML documents

Search XML data efficiently Special SQL queries over

valid + well-formed XML Content-based indexing (e.g.

Text indexes) for searching XML data efficiently

Support for XML query languages (e.g. XQL) on XML data

Database

HTMLHTML

XML

Web

Page 57: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

57

Spring 2000

Christophides Vassilis

Generating and Manipulating XML

Generate XML from the database server Map ODMG, SQL92, SQL3 and PL/SQL

datatypes to XML Provide mappings between java, SQL

and XML types

Script XML content from the database Allow SQL queries to return XML results Provide embedded XML in stored

procedures Java scripting: support embedded XML

in java Common APIs to access any XML

content in databases

DatabaseHTML

XML

Web

Page 58: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

58

Spring 2000

Christophides Vassilis

DatabaseX

DatabaseY

XML XML

XMLSortedXMLTotal

Page 59: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

59

Spring 2000

Christophides Vassilis

1960’s: Data Centric 1970’s: Process Centric 1980’s: Object Oriented 1990’s: Component Based 2000’s: XML?

Epilogue

Page 60: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

60

Spring 2000

Christophides Vassilis

60’sData

•Record Layouts•Printer Layouts•System Flow Charts•Decision Tables

Batch Jobs were a Seriesof small Programs

Data was our First Focus

Page 61: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

61

Spring 2000

Christophides Vassilis

60’sData

70’sLogic

•GOTO-Less Programming•Structured Programming•Top-Down Design

Programs Became Very Large

Then we Focused on Logic

Page 62: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

62

Spring 2000

Christophides Vassilis

60’sData

80’sOO

70’sLogic

•Common Terms for Analysis and Design•Tightly Coupled Code

Code Reuse was the Holy Grail, Rarely Achieved

Object Oriented ProgrammingFocused on Runtime Behavior

Page 63: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

63

Spring 2000

Christophides Vassilis

60’sData

90’sComp

70’sLogic

80’sOOSerialization Tied to Code

•Code Reuse•IDE-Based Composition•Limited Acceptance

Component Programming Shifted the Focus to Interfaces

Page 64: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

64

Spring 2000

Christophides Vassilis

00’sXML

70’sLogic

80’sOO

90’sComp

•XML Wrappers for Incompatible Systems•Industry-Specific Markup Languages•XML for Persistent Data and Composition

XML Enables Middleware for Application-Specific Data

XML Returns the Focus to Data

Page 65: 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

65

Spring 2000

Christophides Vassilis

BIBLIOGRAPHY Charles F. Goldfarb, Paul Prescod, Paper Michael, Leventhal, et al. “The XML

Handbook”. Printice Hall, 1998. David Megginson. “Structuring XML Documents”. Printice Hall, 1998. Simon St. Laurent. “XML : Extensible Markup Language”. IDG Books, 1998. Rick Jelliffe. “The XML and SGML Cookbook : Recipes for Structured

Information”. Printice Hall, 1998. Simon St. Laurent. “Xml : A Primer”. IDG Books, 1998. Steven Holzner. “XML Complete”. McGraw-Hill, 1997. Richard Light, Tim Bray. “Presenting Xml”. Macmillan Publishing, 1997. Bryan Pfaffengerger. “Web Publishing With XML in Six Easy Steps”, 1997. Steven J. DeRose. “The SGML FAQ Book : Understanding the Foundation of

HTML and XML”. Kluwer Academic Publishers, 1997. Sean McGrath. “XML by Example: Building E-Commerce Applications”. Printice

Hall, 1998 Charles F. Goldfarb, Steve Pepper, Chet Ensign. “SGML Buyer’s Guide : A

Unique Guide to Determining Your Requirements and Choosing the Right SGML and XML Products and Services”. Printice Hall, 1998.