XML-QL A Query Language for XML

XML-QL A Query Language for XML

Charuta Nakhe

[email protected]

Querying XML document

What is a query language? Why not adapt SQL or OQL to query XML data? What is an XML query?

• What is the database? -- XML documents• What is input to the query? – XML document• What is the output of the query? – XML document

Requirements of XML query language Query operations :

Selection: eg. Find books with “S. Sudarshan” as author Extraction: eg. Extract the publisher field of above books Restructuring : Restructuring of elements Combination : Queries over more than one documents

Must be able to transform & create XML structures Capability for querying even in absence of schema

The XML-QL language The XML-QL language is designed with the following

features: • it is declarative, like SQL. • it is relational complete, e.g. it can express joins. • it can be implemented with known database techniques. • it can extract data from existing XML documents and construct

new XML documents.

XML-QL is implemented as a prototype and is freely available in a Java version.

Example XML document<bib> <book year=“1997”> <title>Inside COM</title>

<author>Dale Rogerson</author> <publisher><name>Microsoft</name</publisher> </book> <book year=“1998”> <title>Database system concepts</title> <author>S. Sudarshan</author>

<author>H. Korth</author> <publisher> <name>McGrawHill</name</publisher> </book> </bib>

Matching data using patterns Find those authors who have published books for

McGraw Hill: WHERE <bib><book>

<publisher><name>McGraw Hill</></> <title>$t</>

<author>$a</> </book></bib> IN “bib.xml” CONSTRUCT

<result><title>$t</><author>$a</></>• the $t and $a are variables that pick out contents. • the output is a collection of author names.

Result XML document

<result> <title>Database system concepts</title> <author>S. Sudarshan</author> </result> <result> <title>Database system concepts</title> <author>H. Korth</author> </result>

Grouping with Nested Queries Group results by book title : WHERE <bib.book>$p</> IN “bib.xml”,

<title>$t</> <publisher><name>McGraw Hill</></>

IN $p CONSTRUCT <result> <title>$t</>

WHERE <author>$a</> IN $p CONSTRUCT $a

</>• Produces one result for each title and contains

a list of all its authors

Result XML document

<result> <title>Database system

concepts</title> <author>S. Sudarshan</author> <author>H. Korth</author> </result> .

.

Constructing XML data Results of a query can be wrapped in XML:

WHERE <bib.book>

<publisher><name>McGrawHill</></> <title>$t</>

<author>$a</> </> IN “bib.xml”

CONSTRUCT <result><author>$a</><title>$t</></>

• Results are grouped in elements. • The pattern matches once for each author, which may give

duplicates of books.

Joining elements by value Find all articles that have at least one author who

has also written a book since 1995 : WHERE <bib.article>

<author>$n</>I </> CONTENT_AS $a IN “bib.xml”, <book year=$y><author>$n</> </> IN “bib.xml”, y > 1995

CONSTRUCT <article>$a</>• CONTENT_AS $a following a pattern binds the

content of the matching element to the variable $a

Tag variables Find all publications in 1995 where Smith is either

an author or editor : WHERE <bib.$p>

<title>$t</> <year>1995</> <$e>Smith</> </> IN “bib.xml”, $e IN {author, editor}

CONSTRUCT <$p><title>$t</><$e>Smith</></>

• $p matches book and article. • $e matches author and editor.

Regular-path expressionsFind the name of every part element that contains a brand element equal to “Ford”, regardless of the nesting level at which r occurs. WHERE <part*>

<title>$r</> <name>Ford</>

</> IN “bib.xml” CONSTRUCT <result>$r</>• Regular path expressions can specify element paths of

arbitrary depth

Other interesting features

Constructing explicit root element Grouping of data Transforming XML data Integrating data from different XML

sources

Links for more information www.w3.org/TR/NOTE-xml-ql : The XML-QL W3C

Note www.research.att.com/~mff/xmlql/doc : The XML-QL

home page www.w3.org/XML/Activity.html#query-wg : The XML

Query Working Group www.w3.org/TR/xmlquery-req : XML Query

Requirements (W3C Working Draft) www.oasis-open.org/cover/xmlQuery.html : Robin

Cover's page on XML query languages

Example DTD

<!ELEMENT book (author+, title, publisher)>

<!ATTLIST book year CDATA>

<!ELEMENT article (author+, title, year?)>

<!ATTLIST article type CDATA>

<!ELEMENT publisher (name, address)>

<!ELEMENT author (firstname?, lastname)>

Creating an explicit root element Every XML document must have a single root. XML-QL supplies

an <XML> element as default, but others may be specified: CONSTRUCT <results> {

WHERE <bib.book> <publisher><name>McGrawHill</></> <title> $t </>

<author> $a </> </> IN “bib.xml”

CONSTRUCT <result><author>$a</><title>$t</></>

} </results>

Documents

XML-QL A Query Language for XML