Upload
johnna
View
40
Download
1
Embed Size (px)
DESCRIPTION
Querying XML. Sameer S. Pradhan. The Problem (DBMS Vs Docs). 3-level hierarchy: table, record and field Order is not part of the information Strings in separate fields are separate Location of data is not generally significant - PowerPoint PPT Presentation
Citation preview
Querying XML
Sameer S. Pradhan
The Problem (DBMS Vs Docs)3-level hierarchy: table, record and fieldOrder is not part of the informationStrings in separate fields are separateLocation of data is not generally significantLinking is far more often part of the data,
not part of the schema representing data
Goals
Data Model Based on XML Infoset
Query OperatorsQuery Language
Usage Scenarios
Human readable documentsData-oriented documentsMixed-model documentsAdministrative dataFiltering streamsMultiple syntactic environments
General Requirements
Syntax Binding MAY have more than one syntax binding
Declarativity MUST be declarative
Protocol Independence MUST be defined independently of any
protocolsError Conditions
XML Query Functionality (1)
Quantifiers MUST include support for both Universal and
Existential QuantifiersHierarchy and Sequence
MUST support operations on hierarchy and sequence of document structures
Aggregation MUST allow computing summary information
XML Query Functionality (2)
Combination MUST be able to combine information from
multiple documents or from different parts of the same document
Sorting MUST be able to sort query results
Structural Preservation MUST preserve structure of original
document
XML Query Functionality (3)
Structural Transformation MUST be able to transform and create new
structuresReferences
MUST be able to traverse intra- and inter-document references
Text and Element Boundaries MUST handle text across element boundaries
XML Query Functionality (4)
Operation on Schemas MUST be able to access Schemas or DTDs
Extensibility SHOULD support the use of externally
defined functionsOperation on Names
MUST perform simple operations on names MAY perform more powerful operations
XML Query Functionality (5)
Closure MUST be closed with respect to the XML
Query data model
XML Query Data Model (1)
Datatypes MUST represent XML 1.0 data as well as
simple and complex types of XML SchemaReferences
MUST include support for references, both, internal and external
Schema Availability MUST query even in the absence of Schema
XML Query Data Model (2)
Trees Node-labeled Edge-labeled
XML Query data model is a Node-labeled, tree-constructor representation
Node functions Constructors Accessors
Node Accessors
A node has eight accessors isDocNode isElemNode isValueNode isAttrNode isNSNode isPINode isCommentNode isInfoItemNode
Value Constructors
Fourteen primitive XML Schema datatypes stringValue boolValue floatValue doubleValue decimalValue timeDurValue recurDurValue
binaryValue urirefValue idValue idrefValue qnameValue entityValue notationValue
Note: ValueNode replaces XPath’s TextNode
Example<?xml version=1.0?><p:part xmlns:p=“http://www.mywebsite.com/PartSchema” xsi:schemaLocation =
“http://www.mywebsite.com/PartSchema
http://www.mywebsite.com/PartSchema” name=“nutbolt”> <mfg>Acme</mfg> <price>10.50</price></p:part>
Data-Model (1)children(D1) = [ Ref(E1) ]root(D1) = Ref(E1) name(E1) =
QNameValue("http://www.mywebsite.com/PartSchema", "part", Ref(Def_QName))children(E1) = [ Ref(E2), Ref(E3) ] attributes(E1) = { Ref(A1) } namespaces(E1) = { Ref(N1) } type(E1) = Ref(Def_part_type)parent(E1) = Ref(D1)
name(A1) = QNameValue(null, "name", Ref(Def_QName))value(A1) = Ref(StringValue("nutbolt", Ref(Def_string)))
Data-Model (2)parent(A1) = Ref(E1) prefix(N1) = Ref(StringValue("p", Ref(Def_string)))uri(N1) =
URIRefValue("http://www.mywebsite.com/PartSchema", Ref(Def_uriReference))parent(N1) = Ref(E1)
Constraints on Data Model
Node References Defined by the query system NOT by the
query languageNode Identity
The function ref is one-to-one onto ref_equal(ref(n1), ref(n2)) equal(n1,n2)
Unique parentDuplicate-free list of children
XQL
XQL - XML Query LanguageThe name was an ad hoc selection,
but seems like it has and will survive for quite some time
XQL Design (1)
Compact, easy to type and readSimple for common casesEmbeddable in programs, scripts, URLsUnique identification of each nodeDeclarative NOT proceduralEvaluation at any level in the documentResult in document order; no repeat
node
XQL Design (2)
Superset of XSLClosure is guaranteed ONLY if the
implementation returns well-formed XML documents
XQL: Syntax (1)
Mimics the URI navigation syntaxNotation
/ : Root context ./ : Current context // : Recursive descent from root .// : Recursive descent from current node @ : Attribute * : Any element
Sample Document<?xml version='1.0'?><!-- This file represents a fragment of a book store inventory database --><bookstore specialty='novel'> <book style='autobiography'> <title>Seven Years in Trenton</title> <author> <first-name>Joe</first-name> <last-name>Bob</last-name> <award>Trenton Literary Review Honorable Mention</award> </author> <price>12</price> </book><my:book style='leather' price='29.50' xmlns:my='http://www.placeholder-name-
here.com/schema/'> <my:title>Who's Who in Trenton</my:title> <my:author>Robert Bob</my:author> </my:book></bookstore>
XQL: Examples (1) ./author author /bookstore //author .//author book[bookstore/@specialty = @style] author/first-name author/* bookstore//title bookstore/*/title *[@specialty]
XQL: Examples (2) book[@style] book/@style book[excerpt]/author[degree] book[excerpt][title] book[excerpt $and$
title] author[name = …] author[name $eq$ …] author[. = ‘Bob’] author[text() = ‘Bob’] author[first-name!text() = ‘Bob’] degree[index() $lt$ 3] degree[index() < 3]
XQL: Examples (3)<x> <y/> <y/> </x> <x> <y/> <y/> </x>
x/y[index() = 0] x/y[0] (x/y)[0] x[0]/y[0] book[end()] author[first-name][2] price[@intl!value() = ‘canada’] my:* *:book book/@my:style
XQL: Examples (4) author[publications!count() > 10] books[pub_date < date(‘1995-01-01’)] books[pub_date < date(@first)] bookstore/(book | magazine) //comment()[1] ancestor(book/author) author[0, 2 $to$ 4, -1]
XML-QL
SQL-like Features of query languages for semi-
structured data Supports joins and aggregates
XML-QL: Sample Document<bib> <book year="1995"> <!-- A good introductory text --> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book> <book year="1998"> <title> Foundation for Object Databases: The Third Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book></bib>
XML-QL: Flattening Query (1)
WHERE <book> <publisher><name>Addison-Wesley</name></publisher> <title> $t</title> <author> $a</author> </book> IN "www.a.b.c/bib.xml" CONSTRUCT $a
Note: Flattening is not possible with XQL
XML-QL: Result (1)
<result> <author> <lastname> Date </lastname> </author> <title> An Introduction to Database Systems </title> </result><result> <author> <lastname> Date </lastname> </author> <title> Foundation for Object Databases: The Third Manifesto </title> </result><result> <author> <lastname> Darwen </lastname> </author> <title> Foundation for ObjectDatabases: The Third Manifesto </title></result>
XML-QL: Nested Queries (2)
WHERE <book > $p</> IN "www.a.b.c/bib.xml", <title > $t</>, <publisher><name>Addison-Wesley</></> IN $p CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN $p CONSTRUCT <author> $a</> </>
XML-QL: CONTENT_AS
WHERE <book> <title> $t </> <publisher><name>Addison-Wesley </> </> </> CONTENT_AS $p IN "www.a.b.c/bib.xml"CONSTRUCT <result><title> $t </> WHERE <author> $a</> IN $p CONSTRUCT <author> $a</> </>
XML-QL: Result (2)<result> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> </result>
<result> <title> Foundation for Object/Relational Databases: The Third
Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author></result>
XML-QL: Query (3)WHERE <article> <author> <firstname> $f </> // firstname $f <lastname> $l </> // lastname $l </></> CONTENT_AS $a IN "www.a.b.c/bib.xml”<book year=$y> <author> <firstname> $f </> // join on same firstname $f <lastname> $l </> // join on same lastname $l </></> IN "www.a.b.c/bib.xml", $y > 1995CONSTRUCT <article> $a </>
XML-QL: ELEMENT_ASWHERE <article> <author> <firstname> $f</> // firstname $f <lastname> $l</> // lastname $l </> </> ELEMENT_AS $e IN "www.a.b.c/bib.xml"...CONSTRUCT $e
XML-QL: Tag VariablesWHERE <$p> <title> $t </title> <year>1995</> <$e> Smith </> </> IN "www.a.b.c/bib.xml", $e IN {author, editor} CONSTRUCT <$p> <title> $t </title> <$e> Smith </> </>
Note: XQL does not support tag variables
XML-QL: Regular Expressions<!ELEMENT part (name brand part*)><!ELEMENT name CDATA><!ELEMENT brand CDATA>
WHERE <part*><name>$r</> <brand>Ford</> </> IN www.a.b.c/bib.xml" CONSTRUCT <result>$r</>
WHERE <$*> <name>$r</> <brand>Ford</> </> IN "www.a.b.c/bib.xml"CONSTRUCT <result>$r</>
WHERE <part+.(subpart|component.piece)>$r</> IN "www.a.b.c/parts.xml" CONSTRUCT <result> $r</>
Note: XQL does not support regular expressions
XML-QL: JoinsWHERE <person> <name></> ELEMENT_AS $n <ssn> $ssn</> </> IN "www.a.b.c/data.xml",
<taxpayer> <ssn> $ssn</> <income></> ELEMENT_AS $i </> IN "www.irs.gov/taxpayers.xml" CONSTRUCT <result> $n $i </>
XML-QL: OrderingWHERE <pub> &p </> in "www.a.b.c/bib.xml", <title> $t </> in $p, <year> $y </> in $p <month> $z </> in $p ORDER-BY $y,$z CONSTRUCT $t
Note: XQL does not support ordering
XML-QL: GroupingCONSTRUCT <results> { WHERE <bib><book> <title>$t</title> <author><last>$l</last><first>$f</first></author> </book> </bib> IN "www.bn.com/bib.xml" CONSTRUCT <result ID=author($l,$f)> <title>$t</title> <author><last>$l</last><first>$f</first></author></result>} </results>
Note: Explicit grouping is not possible with XQL
XML-QL: FunctionsFUNCTION findDeclaredIncomes($Taxpayers, $Employees) WHERE <taxpayer> <ssn> $s </> <income> $x </> </> IN $Taxpayers, <employee> <ssn> $s </> <name> $n </> </> IN $Employees CONSTRUCT <result> <name> $n </> <Income> $x </> </> END
findDelcaredIncomes("www.irs.gov/taxpayers.xml", “www.a.b.c/employees.xml")
XQuery
Builds directly on XPointerSpecial type for the resultsAbility to return ranges (spans)
XQuery: Syntax ? : Selects element with given id ^ : Selects among containers of current
node < : Preceding sibling > : Following sibling « : All preceding nodes » : All following nodes @ : Attribute $ : Selects a range by matching a string
XQuery: Queries descendant(FOOTNOTE & TYPE=‘CITATION’).(REF) descendent(SEC & descendent(LEVEL = ‘SECRET’)) descendent(FOOTNOTE & TYPE=‘CITATION’).
(REF){1-2}.link(role=AUTHOR) descendent(FOOTNOTE & (child(AUTHOR).attr(TYPE)
= *(ancestor(CHAPTER).attr(AUTHOR))) union(id(foo), id(bar), descendent(SEC)) intersection(descendent (ITEM & string(‘dog’)),
descendent (ITEM & string(‘cat’))) difference(fsibling(div), ID(SECRET)) ^TI P* [^UI OL DL] {1,3} SUMMARY $
Other Query LanguagesLorel (Lightweight Object REpository
Language)YATLXtractXmlqueryXML Query Engine
And...
QUILTThe problem with most query languages is
that they are either document oriented or database oriented
QUILT is derived from both domains and promises substantial coverage of both areas
It has a FLWR (pronounced as ‘flower’) construct
References http://www.w3.org/TR/2000/WD-xmlquery-req-20000131 http://www.w3.org/TandS/QL/QL98/pp/xql.html http://www.w3.org/TR/1998/NOTE-xml-ql-19980819/ http://www.w3.org/TandS/QL/QL98/pp/xquery.html http://www.fatdog.com/ http://www.almaden.ibm.com/cs/people/chamberlin/
quilt_lncs.pdf http://www-db.research.bell-labs.com/user/simeon/xquery.html http://www-db.stanford.edu/lore/ http://www.cs.washington.edu/homes/zives/research/
xmlquery.pdf http://www.oasis-open.org/cover/xmlQuery.html (main
source)