48
Querying XML Sameer S. Pradhan

Querying XML

  • Upload
    johnna

  • View
    40

  • Download
    1

Embed Size (px)

DESCRIPTION

Querying XML. Sameer S. Pradhan. The Problem (DBMS Vs Docs). 3-level hierarchy: table, record and field Order is not part of the information Strings in separate fields are separate Location of data is not generally significant - PowerPoint PPT Presentation

Citation preview

Page 1: Querying XML

Querying XML

Sameer S. Pradhan

Page 2: Querying XML

The Problem (DBMS Vs Docs)3-level hierarchy: table, record and fieldOrder is not part of the informationStrings in separate fields are separateLocation of data is not generally significantLinking is far more often part of the data,

not part of the schema representing data

Page 3: Querying XML

Goals

Data Model Based on XML Infoset

Query OperatorsQuery Language

Page 4: Querying XML

Usage Scenarios

Human readable documentsData-oriented documentsMixed-model documentsAdministrative dataFiltering streamsMultiple syntactic environments

Page 5: Querying XML

General Requirements

Syntax Binding MAY have more than one syntax binding

Declarativity MUST be declarative

Protocol Independence MUST be defined independently of any

protocolsError Conditions

Page 6: Querying XML

XML Query Functionality (1)

Quantifiers MUST include support for both Universal and

Existential QuantifiersHierarchy and Sequence

MUST support operations on hierarchy and sequence of document structures

Aggregation MUST allow computing summary information

Page 7: Querying XML

XML Query Functionality (2)

Combination MUST be able to combine information from

multiple documents or from different parts of the same document

Sorting MUST be able to sort query results

Structural Preservation MUST preserve structure of original

document

Page 8: Querying XML

XML Query Functionality (3)

Structural Transformation MUST be able to transform and create new

structuresReferences

MUST be able to traverse intra- and inter-document references

Text and Element Boundaries MUST handle text across element boundaries

Page 9: Querying XML

XML Query Functionality (4)

Operation on Schemas MUST be able to access Schemas or DTDs

Extensibility SHOULD support the use of externally

defined functionsOperation on Names

MUST perform simple operations on names MAY perform more powerful operations

Page 10: Querying XML

XML Query Functionality (5)

Closure MUST be closed with respect to the XML

Query data model

Page 11: Querying XML

XML Query Data Model (1)

Datatypes MUST represent XML 1.0 data as well as

simple and complex types of XML SchemaReferences

MUST include support for references, both, internal and external

Schema Availability MUST query even in the absence of Schema

Page 12: Querying XML

XML Query Data Model (2)

Trees Node-labeled Edge-labeled

XML Query data model is a Node-labeled, tree-constructor representation

Node functions Constructors Accessors

Page 13: Querying XML

Node Accessors

A node has eight accessors isDocNode isElemNode isValueNode isAttrNode isNSNode isPINode isCommentNode isInfoItemNode

Page 14: Querying XML

Value Constructors

Fourteen primitive XML Schema datatypes stringValue boolValue floatValue doubleValue decimalValue timeDurValue recurDurValue

binaryValue urirefValue idValue idrefValue qnameValue entityValue notationValue

Note: ValueNode replaces XPath’s TextNode

Page 15: Querying XML

Example<?xml version=1.0?><p:part xmlns:p=“http://www.mywebsite.com/PartSchema” xsi:schemaLocation =

“http://www.mywebsite.com/PartSchema

http://www.mywebsite.com/PartSchema” name=“nutbolt”> <mfg>Acme</mfg> <price>10.50</price></p:part>

Page 16: Querying XML

Data-Model (1)children(D1) = [ Ref(E1) ]root(D1) = Ref(E1) name(E1) =

QNameValue("http://www.mywebsite.com/PartSchema", "part", Ref(Def_QName))children(E1) = [ Ref(E2), Ref(E3) ] attributes(E1) = { Ref(A1) } namespaces(E1) = { Ref(N1) } type(E1) = Ref(Def_part_type)parent(E1) = Ref(D1)

name(A1) = QNameValue(null, "name", Ref(Def_QName))value(A1) = Ref(StringValue("nutbolt", Ref(Def_string)))

Page 17: Querying XML

Data-Model (2)parent(A1) = Ref(E1) prefix(N1) = Ref(StringValue("p", Ref(Def_string)))uri(N1) =

URIRefValue("http://www.mywebsite.com/PartSchema", Ref(Def_uriReference))parent(N1) = Ref(E1)

Page 18: Querying XML

Constraints on Data Model

Node References Defined by the query system NOT by the

query languageNode Identity

The function ref is one-to-one onto ref_equal(ref(n1), ref(n2)) equal(n1,n2)

Unique parentDuplicate-free list of children

Page 19: Querying XML

XQL

XQL - XML Query LanguageThe name was an ad hoc selection,

but seems like it has and will survive for quite some time

Page 20: Querying XML

XQL Design (1)

Compact, easy to type and readSimple for common casesEmbeddable in programs, scripts, URLsUnique identification of each nodeDeclarative NOT proceduralEvaluation at any level in the documentResult in document order; no repeat

node

Page 21: Querying XML

XQL Design (2)

Superset of XSLClosure is guaranteed ONLY if the

implementation returns well-formed XML documents

Page 22: Querying XML

XQL: Syntax (1)

Mimics the URI navigation syntaxNotation

/ : Root context ./ : Current context // : Recursive descent from root .// : Recursive descent from current node @ : Attribute * : Any element

Page 23: Querying XML

Sample Document<?xml version='1.0'?><!-- This file represents a fragment of a book store inventory database --><bookstore specialty='novel'> <book style='autobiography'> <title>Seven Years in Trenton</title> <author> <first-name>Joe</first-name> <last-name>Bob</last-name> <award>Trenton Literary Review Honorable Mention</award> </author> <price>12</price> </book><my:book style='leather' price='29.50' xmlns:my='http://www.placeholder-name-

here.com/schema/'> <my:title>Who's Who in Trenton</my:title> <my:author>Robert Bob</my:author> </my:book></bookstore>

Page 24: Querying XML

XQL: Examples (1) ./author author /bookstore //author .//author book[bookstore/@specialty = @style] author/first-name author/* bookstore//title bookstore/*/title *[@specialty]

Page 25: Querying XML

XQL: Examples (2) book[@style] book/@style book[excerpt]/author[degree] book[excerpt][title] book[excerpt $and$

title] author[name = …] author[name $eq$ …] author[. = ‘Bob’] author[text() = ‘Bob’] author[first-name!text() = ‘Bob’] degree[index() $lt$ 3] degree[index() < 3]

Page 26: Querying XML

XQL: Examples (3)<x> <y/> <y/> </x> <x> <y/> <y/> </x>

x/y[index() = 0] x/y[0] (x/y)[0] x[0]/y[0] book[end()] author[first-name][2] price[@intl!value() = ‘canada’] my:* *:book book/@my:style

Page 27: Querying XML

XQL: Examples (4) author[publications!count() > 10] books[pub_date < date(‘1995-01-01’)] books[pub_date < date(@first)] bookstore/(book | magazine) //comment()[1] ancestor(book/author) author[0, 2 $to$ 4, -1]

Page 28: Querying XML

XML-QL

SQL-like Features of query languages for semi-

structured data Supports joins and aggregates

Page 29: Querying XML

XML-QL: Sample Document<bib> <book year="1995"> <!-- A good introductory text --> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book> <book year="1998"> <title> Foundation for Object Databases: The Third Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book></bib>

Page 30: Querying XML

XML-QL: Flattening Query (1)

WHERE <book> <publisher><name>Addison-Wesley</name></publisher> <title> $t</title> <author> $a</author> </book> IN "www.a.b.c/bib.xml" CONSTRUCT $a

Note: Flattening is not possible with XQL

Page 31: Querying XML

XML-QL: Result (1)

<result> <author> <lastname> Date </lastname> </author> <title> An Introduction to Database Systems </title> </result><result> <author> <lastname> Date </lastname> </author> <title> Foundation for Object Databases: The Third Manifesto </title> </result><result> <author> <lastname> Darwen </lastname> </author> <title> Foundation for ObjectDatabases: The Third Manifesto </title></result>

Page 32: Querying XML

XML-QL: Nested Queries (2)

WHERE <book > $p</> IN "www.a.b.c/bib.xml", <title > $t</>, <publisher><name>Addison-Wesley</></> IN $p CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN $p CONSTRUCT <author> $a</> </>

Page 33: Querying XML

XML-QL: CONTENT_AS

WHERE <book> <title> $t </> <publisher><name>Addison-Wesley </> </> </> CONTENT_AS $p IN "www.a.b.c/bib.xml"CONSTRUCT <result><title> $t </> WHERE <author> $a</> IN $p CONSTRUCT <author> $a</> </>

Page 34: Querying XML

XML-QL: Result (2)<result> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> </result>

<result> <title> Foundation for Object/Relational Databases: The Third

Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author></result>

Page 35: Querying XML

XML-QL: Query (3)WHERE <article> <author> <firstname> $f </> // firstname $f <lastname> $l </> // lastname $l </></> CONTENT_AS $a IN "www.a.b.c/bib.xml”<book year=$y> <author> <firstname> $f </> // join on same firstname $f <lastname> $l </> // join on same lastname $l </></> IN "www.a.b.c/bib.xml", $y > 1995CONSTRUCT <article> $a </>

Page 36: Querying XML

XML-QL: ELEMENT_ASWHERE <article> <author> <firstname> $f</> // firstname $f <lastname> $l</> // lastname $l </> </> ELEMENT_AS $e IN "www.a.b.c/bib.xml"...CONSTRUCT $e

Page 37: Querying XML

XML-QL: Tag VariablesWHERE <$p> <title> $t </title> <year>1995</> <$e> Smith </> </> IN "www.a.b.c/bib.xml", $e IN {author, editor} CONSTRUCT <$p> <title> $t </title> <$e> Smith </> </>

Note: XQL does not support tag variables

Page 38: Querying XML

XML-QL: Regular Expressions<!ELEMENT part (name brand part*)><!ELEMENT name CDATA><!ELEMENT brand CDATA>

WHERE <part*><name>$r</> <brand>Ford</> </> IN www.a.b.c/bib.xml" CONSTRUCT <result>$r</>

WHERE <$*> <name>$r</> <brand>Ford</> </> IN "www.a.b.c/bib.xml"CONSTRUCT <result>$r</>

WHERE <part+.(subpart|component.piece)>$r</> IN "www.a.b.c/parts.xml" CONSTRUCT <result> $r</>

Note: XQL does not support regular expressions

Page 39: Querying XML

XML-QL: JoinsWHERE <person> <name></> ELEMENT_AS $n <ssn> $ssn</> </> IN "www.a.b.c/data.xml",

<taxpayer> <ssn> $ssn</> <income></> ELEMENT_AS $i </> IN "www.irs.gov/taxpayers.xml" CONSTRUCT <result> $n $i </>

Page 40: Querying XML

XML-QL: OrderingWHERE <pub> &p </> in "www.a.b.c/bib.xml", <title> $t </> in $p, <year> $y </> in $p <month> $z </> in $p ORDER-BY $y,$z CONSTRUCT $t

Note: XQL does not support ordering

Page 41: Querying XML

XML-QL: GroupingCONSTRUCT <results> { WHERE <bib><book> <title>$t</title> <author><last>$l</last><first>$f</first></author> </book> </bib> IN "www.bn.com/bib.xml" CONSTRUCT <result ID=author($l,$f)> <title>$t</title> <author><last>$l</last><first>$f</first></author></result>} </results>

Note: Explicit grouping is not possible with XQL

Page 42: Querying XML

XML-QL: FunctionsFUNCTION findDeclaredIncomes($Taxpayers, $Employees) WHERE <taxpayer> <ssn> $s </> <income> $x </> </> IN $Taxpayers, <employee> <ssn> $s </> <name> $n </> </> IN $Employees CONSTRUCT <result> <name> $n </> <Income> $x </> </> END

findDelcaredIncomes("www.irs.gov/taxpayers.xml", “www.a.b.c/employees.xml")

Page 43: Querying XML

XQuery

Builds directly on XPointerSpecial type for the resultsAbility to return ranges (spans)

Page 44: Querying XML

XQuery: Syntax ? : Selects element with given id ^ : Selects among containers of current

node < : Preceding sibling > : Following sibling « : All preceding nodes » : All following nodes @ : Attribute $ : Selects a range by matching a string

Page 45: Querying XML

XQuery: Queries descendant(FOOTNOTE & TYPE=‘CITATION’).(REF) descendent(SEC & descendent(LEVEL = ‘SECRET’)) descendent(FOOTNOTE & TYPE=‘CITATION’).

(REF){1-2}.link(role=AUTHOR) descendent(FOOTNOTE & (child(AUTHOR).attr(TYPE)

= *(ancestor(CHAPTER).attr(AUTHOR))) union(id(foo), id(bar), descendent(SEC)) intersection(descendent (ITEM & string(‘dog’)),

descendent (ITEM & string(‘cat’))) difference(fsibling(div), ID(SECRET)) ^TI P* [^UI OL DL] {1,3} SUMMARY $

Page 46: Querying XML

Other Query LanguagesLorel (Lightweight Object REpository

Language)YATLXtractXmlqueryXML Query Engine

And...

Page 47: Querying XML

QUILTThe problem with most query languages is

that they are either document oriented or database oriented

QUILT is derived from both domains and promises substantial coverage of both areas

It has a FLWR (pronounced as ‘flower’) construct

Page 48: Querying XML

References http://www.w3.org/TR/2000/WD-xmlquery-req-20000131 http://www.w3.org/TandS/QL/QL98/pp/xql.html http://www.w3.org/TR/1998/NOTE-xml-ql-19980819/ http://www.w3.org/TandS/QL/QL98/pp/xquery.html http://www.fatdog.com/ http://www.almaden.ibm.com/cs/people/chamberlin/

quilt_lncs.pdf http://www-db.research.bell-labs.com/user/simeon/xquery.html http://www-db.stanford.edu/lore/ http://www.cs.washington.edu/homes/zives/research/

xmlquery.pdf http://www.oasis-open.org/cover/xmlQuery.html (main

source)