27
More XML: semantics, DTDs, XPATH February 18, 2004

More XML: semantics, DTDs, XPATH February 18, 2004

Embed Size (px)

Citation preview

Page 1: More XML: semantics, DTDs, XPATH February 18, 2004

More XML: semantics, DTDs, XPATH

February 18, 2004

Page 2: More XML: semantics, DTDs, XPATH February 18, 2004

XML Document

<data>

<person id=“o555” >

<name> Mary </name>

<address>

<street> Maple </street>

<no> 345 </no>

<city> Seattle </city>

</address>

</person>

<person>

<name> John </name>

<address> Thailand </address>

<phone> 23456 </phone>

<married/>

</person>

</data>

<data>

<person id=“o555” >

<name> Mary </name>

<address>

<street> Maple </street>

<no> 345 </no>

<city> Seattle </city>

</address>

</person>

<person>

<name> John </name>

<address> Thailand </address>

<phone> 23456 </phone>

<married/>

</person>

</data>

person elements

name elements

attributes

Page 3: More XML: semantics, DTDs, XPATH February 18, 2004

XML Terminology• Elements

– enclosed within tags: • <person> … </person>

– nested within other elements: • <person> <address> … </address> </person>

– can be empty • <married></married> abbreviated as <married/>

– can have Attributes• <person id=“0005”> … </person>

• XML document has as single ROOT element

Page 4: More XML: semantics, DTDs, XPATH February 18, 2004

Buzzwords

• What is XML?– W3C data exchange format– Hierarchical data model– Self-describing– Semi-structured

Page 5: More XML: semantics, DTDs, XPATH February 18, 2004

XML as a Tree !!

<data>

<person id=“o555” >

<name> Mary </name>

<address>

<street> Maple </street>

<no> 345 </no>

<city> Seattle </city>

</address>

</person>

<person>

<name> John </name>

<address> Thailand </address>

<phone> 23456 </phone>

</person>

</data>

<data>

<person id=“o555” >

<name> Mary </name>

<address>

<street> Maple </street>

<no> 345 </no>

<city> Seattle </city>

</address>

</person>

<person>

<name> John </name>

<address> Thailand </address>

<phone> 23456 </phone>

</person>

</data>

data

personperson

Mary

name address

street no city

Maple 345 Seattle

nameaddress

John Thai

phone

23456

id

o555

Elementnode

Textnode

Attributenode

Minor Detail: Order matters !!!

Page 6: More XML: semantics, DTDs, XPATH February 18, 2004

XML is self-describing

• Schema elements become part of the data– In XML <persons>, <name>, <phone> are

part of the data, and are repeated many times

– Relational schema: persons(name,phone) defined separately for the data and is fixed

• Consequence: XML is much more flexible

Page 7: More XML: semantics, DTDs, XPATH February 18, 2004

Relational Data as XML

<persons><person> <name>John</name> <phone> 3634</phone> </person> <person> <name>Sue</name> <phone> 6343</phone> </person> <person> <name>Dick</name> <phone> 6363</phone> </person>

</persons>

<persons><person> <name>John</name> <phone> 3634</phone> </person> <person> <name>Sue</name> <phone> 6343</phone> </person> <person> <name>Dick</name> <phone> 6363</phone> </person>

</persons>

n a m e p h o n e

J o h n 3 6 3 4

S u e 6 3 4 3

D i c k 6 3 6 3

personperson person person

name name namephone phone phone

“John” 3634 “Sue” “Dick”6343 6363

persons

XML:

Page 8: More XML: semantics, DTDs, XPATH February 18, 2004

XML is semi-structured

• Missing elements:

• Could represent in a table with nulls

<person> <name> John</name> <phone>1234</phone> </person>

<person> <name>Joe</name></person>

<person> <name> John</name> <phone>1234</phone> </person>

<person> <name>Joe</name></person> no phone !

name phone

John 1234

Joe -

Page 9: More XML: semantics, DTDs, XPATH February 18, 2004

XML is semi-structured

• Repeated elements

• Impossible in tables:

<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>

<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>

two phones !

name phone

Mary 2345 3456 ???

Page 10: More XML: semantics, DTDs, XPATH February 18, 2004

XML is semi-structured

• Elements with different types in different objects

• Heterogeneous collections:– <persons> can contain both <person>s and

<customer>s

<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>

<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>

structured name !

Page 11: More XML: semantics, DTDs, XPATH February 18, 2004

Document Type Definition: DTD

• an XML document may have a DTD– rules about the contents of elements– like a schema for an XML document

• XML document:well-formed = if tags are correctly closed

valid = if it has a DTD and conforms to it

• validation is useful in data exchange

• part of the original XML specification

Page 12: More XML: semantics, DTDs, XPATH February 18, 2004

Very Simple DTD

<!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)>]>

<!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)>]>

Element What’s in the element?

Root Element

Page 13: More XML: semantics, DTDs, XPATH February 18, 2004

DTD: The Content Model

• Content model:– Complex = a regular expression over other elements– Text-only = #PCDATA– Empty = EMPTY– Any = ANY– Mixed content = (#PCDATA | A | B | C)*

<!ELEMENT tag (CONTENT)><!ELEMENT tag (CONTENT)>

contentmodel

Page 14: More XML: semantics, DTDs, XPATH February 18, 2004

Very Simple DTD

<company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product> ... </product> ...</company>

<company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product> ... </product> ...</company>

Example of valid XML document:

Page 15: More XML: semantics, DTDs, XPATH February 18, 2004

DTD: Regular Expressions

<!ELEMENT name (firstName, lastName))

<!ELEMENT name (firstName, lastName))

<name> <firstName> . . . . . </firstName> <lastName> . . . . . </lastName></name>

<name> <firstName> . . . . . </firstName> <lastName> . . . . . </lastName></name>

<!ELEMENT name (firstName?, lastName))<!ELEMENT name (firstName?, lastName))

DTD XML

<!ELEMENT person (name, phone*))<!ELEMENT person (name, phone*))

sequence

optional

<!ELEMENT person (name, (phone|email)))<!ELEMENT person (name, (phone|email)))

Kleene star

alternation

<person> <name> . . . . . </name> <phone> . . . . . </phone> <phone> . . . . . </phone> <phone> . . . . . </phone> . . . . . .</person>

<person> <name> . . . . . </name> <phone> . . . . . </phone> <phone> . . . . . </phone> <phone> . . . . . </phone> . . . . . .</person>

lots of other features

Page 16: More XML: semantics, DTDs, XPATH February 18, 2004

Querying XML Data

• XPath = simple navigation through the tree

• XQuery = the SQL of XML

• XSLT = recursive traversal– will not discuss in class

Page 17: More XML: semantics, DTDs, XPATH February 18, 2004

Sample Data for Queries<bib>

<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

Page 18: More XML: semantics, DTDs, XPATH February 18, 2004

Data Model for XPath

bib

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul

The root

The root element

Page 19: More XML: semantics, DTDs, XPATH February 18, 2004

XPath: Simple Expressions

Result: <year> 1995 </year>

<year> 1998 </year>

Result: empty (there were no papers)

/bib/book/year/bib/book/year

/bib/paper/year/bib/paper/year

Page 20: More XML: semantics, DTDs, XPATH February 18, 2004

XPath: Restricted Kleene Closure

Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author>

Result: <first-name> Rick </first-name>

//author//author

/bib//first-name/bib//first-name

Page 21: More XML: semantics, DTDs, XPATH February 18, 2004

Xpath: Text Nodes

Result: Serge Abiteboul

Jeffrey D. Ullman

Rick Hull doesn’t appear because he has firstname, lastname

Functions in XPath:– text() = matches the text value– node() = matches any node (= * or @* or text())– name() = returns the name of the current tag

/bib/book/author/text()/bib/book/author/text()

Page 22: More XML: semantics, DTDs, XPATH February 18, 2004

Xpath: Wildcard

Result: <first-name> Rick </first-name>

<last-name> Hull </last-name>

* Matches any element

//author/*//author/*

Page 23: More XML: semantics, DTDs, XPATH February 18, 2004

Xpath: Attribute Nodes

Result: “55”

@price means that price is an attribute

/bib/book/@price/bib/book/@price

Page 24: More XML: semantics, DTDs, XPATH February 18, 2004

Xpath: Predicates

Result: <author> <first-name> Rick </first-name>

<last-name> Hull </last-name>

</author>

/bib/book/author[firstname]/bib/book/author[firstname]

Page 25: More XML: semantics, DTDs, XPATH February 18, 2004

Xpath: More Predicates

Result: <lastname> … </lastname>

<lastname> … </lastname>

/bib/book/author[firstname][address[//zip][city]]/lastname/bib/book/author[firstname][address[//zip][city]]/lastname

Page 26: More XML: semantics, DTDs, XPATH February 18, 2004

Xpath: More Predicates

/bib/book[@price < “60”]/bib/book[@price < “60”]

/bib/book[author/@age < “25”]/bib/book[author/@age < “25”]

/bib/book[author/text()]/bib/book[author/text()]

Page 27: More XML: semantics, DTDs, XPATH February 18, 2004

Xpath: Summarybib matches a bib element* matches any element/ matches the root element/bib matches a bib element under rootbib/paper matches a paper in bibbib//paper matches a paper in bib, at any

depth//paper matches a paper at any depthpaper|book matches a paper or a book@price matches a price attributebib/book/@price matches price attribute in book, in

bibbib/book/[@price<“55”]/author/lastname matches…