54
XQuery/IR: Integrating XML Document and Data Retrieval Jan-Marco Bremer Michael Gertz Department of Computer Science University of California, Davis, CA {bremer|gertz}@cs.ucdavis.edu 1 Introduction One of the most important features of XML is to provide a unified view to all kinds of structured and semistructured data as well as loosely structured doc- uments. Here, document means a coherent unit of usu- ally textual information while data stands for uninter- preted, raw content without a fixed context. Most con- tent XML is applied to today is very text-rich. The structural aspects of the unified XML view are rigid enough to support data retrieval (DR) queries as known from database systems. Over the past few years, increasingly powerful query language, most no- tably the recent XQuery standard [2], have exploited this fact to provide expressive DR query capabilities for XML. On the other hand, XML’s structural aspects are transparent enough to treat arbitrary parts of the XML-represented data as documents. Document or in- formation retrieval (IR) providing one of the most im- portant capabilities for querying text-rich documents, however, is not supported by XQuery or any of the earlier XML query languages so far. DR provides means to formulate queries based on exact matches of data. IR is based on the notion of (relative) relevance of documents within a document collection. Thus, some major issues for the integration of IR into an XML query language arise: Today’s (toy) collections of small XML documents are eventually to be replaced by large, hierarchically structured XML databases. In such databases, the notions of static document collec- tion and document are lost and have to be replaced by some kind of view on documents and document collec- tions that is established dynamically by a query. This poses challenges for new, embedded IR algorithms and their application within the host language rather than only as the final operation as in standard IR. However, besides these challenges, significant rewards in form of an increased expressiveness of the resulting integrated query language might arise. IR in general is based on detecting fine differences in term distributions throughout a document collection leading to valuable information [12]. However, the value of information is highly context-dependent. Cur- rently, the context for IR is always an entire, static document collection. This changes when term statis- tics for dynamic document views are available and ex- ploited by an IR algorithm. A DR language is a natural means to specify a context for a relevance-based query. Moreover, when IR is truly embedded into the query language, derived information can be obtained after the IR-style ranking has identified the most relevant com- ponents. For example, the mean publication date for information related to a certain event reported in tex- tual news naturally leads to an apex of that event. In this paper, we present the results of our ongoing research to integrate IR capabilities into XQuery and through this, provide more expressive queries than DR or IR alone can answer. XQuery-related standards [2, 5] define document fragment sequences (DFSs) as in- and output for all intermediate and final query results. We identify document fragments (DFs) and DFSs as the equivalent of documents and document collections within XML queries. We introduce a single, new oper- ator called rank into XQuery. The operator orders DFs within arbitrary intermediate DFSs. It is used through a Rankby expressions that is very similar to XQuery’s Sortby expression. The operator is orthogonal to other XQuery operators and can be arbitrarily nested. The rank operator itself imposes no requirements on the implementation of the relevance-based ordering of DFs. Thereby, we establish a general framework for research into query language-embedded IR algorithms. However, we define a dynamic ranking principle that captures IR algorithms which allow the richest type of queries based on a local context. These algorithms are based on term statistics only within the current DFS as returned by a (sub-)query. We briefly study consequences of the rank operator for index structures and query optimization. Related work. Oracle’s contains operator imple- ments a standard, static IR approach based on table columns [1]. The operator is used within an SQL Where

XQuery/IR: Integrating XML Document and Data Retrievaldb.ucsd.edu/webdb2002/papers/master1.pdf · XQuery/IR: Integrating XML Document and Data Retrieval ... Oracle’s contains operator

  • Upload
    others

  • View
    50

  • Download
    0

Embed Size (px)

Citation preview

XQuery/IR: Integrating XML Document

and Data Retrieval

Jan-Marco Bremer Michael Gertz

Department of Computer ScienceUniversity of California, Davis, CA{bremer|gertz}@cs.ucdavis.edu

1 Introduction

One of the most important features of XML is toprovide a unified view to all kinds of structured andsemistructured data as well as loosely structured doc-uments. Here, document means a coherent unit of usu-ally textual information while data stands for uninter-preted, raw content without a fixed context. Most con-tent XML is applied to today is very text-rich.

The structural aspects of the unified XML view arerigid enough to support data retrieval (DR) queriesas known from database systems. Over the past fewyears, increasingly powerful query language, most no-tably the recent XQuery standard [2], have exploitedthis fact to provide expressive DR query capabilitiesfor XML. On the other hand, XML’s structural aspectsare transparent enough to treat arbitrary parts of theXML-represented data as documents. Document or in-formation retrieval (IR) providing one of the most im-portant capabilities for querying text-rich documents,however, is not supported by XQuery or any of theearlier XML query languages so far.

DR provides means to formulate queries based on exactmatches of data. IR is based on the notion of (relative)relevance of documents within a document collection.Thus, some major issues for the integration of IR intoan XML query language arise: Today’s (toy) collectionsof small XML documents are eventually to be replacedby large, hierarchically structured XML databases. Insuch databases, the notions of static document collec-tion and document are lost and have to be replaced bysome kind of view on documents and document collec-tions that is established dynamically by a query. Thisposes challenges for new, embedded IR algorithms andtheir application within the host language rather thanonly as the final operation as in standard IR. However,besides these challenges, significant rewards in form ofan increased expressiveness of the resulting integratedquery language might arise.

IR in general is based on detecting fine differences interm distributions throughout a document collection

leading to valuable information [12]. However, thevalue of information is highly context-dependent. Cur-rently, the context for IR is always an entire, staticdocument collection. This changes when term statis-tics for dynamic document views are available and ex-ploited by an IR algorithm. A DR language is a naturalmeans to specify a context for a relevance-based query.Moreover, when IR is truly embedded into the querylanguage, derived information can be obtained after theIR-style ranking has identified the most relevant com-ponents. For example, the mean publication date forinformation related to a certain event reported in tex-tual news naturally leads to an apex of that event.

In this paper, we present the results of our ongoingresearch to integrate IR capabilities into XQuery andthrough this, provide more expressive queries than DRor IR alone can answer. XQuery-related standards[2, 5] define document fragment sequences (DFSs) as in-and output for all intermediate and final query results.We identify document fragments (DFs) and DFSs asthe equivalent of documents and document collectionswithin XML queries. We introduce a single, new oper-ator called rank into XQuery. The operator orders DFswithin arbitrary intermediate DFSs. It is used througha Rankby expressions that is very similar to XQuery’sSortby expression. The operator is orthogonal to otherXQuery operators and can be arbitrarily nested.

The rank operator itself imposes no requirements onthe implementation of the relevance-based ordering ofDFs. Thereby, we establish a general framework forresearch into query language-embedded IR algorithms.However, we define a dynamic ranking principle thatcaptures IR algorithms which allow the richest typeof queries based on a local context. These algorithmsare based on term statistics only within the currentDFS as returned by a (sub-)query. We briefly studyconsequences of the rank operator for index structuresand query optimization.

Related work. Oracle’s contains operator imple-ments a standard, static IR approach based on tablecolumns [1]. The operator is used within an SQL Where

clause. In [6], only exact keyword search within cer-tain XML elements is introduced into XML-QL. Ear-lier related approaches for structured documents canbe found in [11]. The aim of [13] is an improved Websearch by means of an IR-enabled XML query language.The similarity operator of [3] is mainly a vehicle toexecute fuzzy similarity joins based on atomic valueswithin two DFs. Again, XML-QL is used as host lan-guage. Based on XQL, [8] proposes an IR-style opera-tor that computes similarity between DFs and queriesby means of XML leaf elements’ similarity weights thatare propagated upwards. However, XQL is less expres-sive than XQuery and lacks important data retrievalcapabilities.

Common to all of these approaches is the extension ofa Where clause which implies a selection and thus, apartitioning of the input into relevant and non-relevantobjects. The relevance-based sorting employed by ourscheme more naturally reflects IR’s ordering propertieswhile still allowing the partitioning by means of a rele-vance threshold. Moreover, unlike our approach, noneof the existing approaches is able to demonstrate mean-ingful and useful nested queries involving both DR andIR operations. We have not yet seen other work ap-plying IR to a local context established by a DR (sub-)query or a previous IR-style query.

Passage retrieval approaches [9] have used more fine-grained document components to improve the rankingbut without supporting IR on document parts as a lo-cal context. The feasibility and usefulness of IR indexstructures for semistructured data as required by ourapproach have been shown in [10].

Overview. In Section 2, we establish the backgroundfor this paper. In Section 3, we introduce the new op-erator and demonstrate its application. Section 4 dis-cusses implementation issues of the operator includingdynamic ranking, index structures and query optimiza-tion. In Section 5, we conclude this paper and brieflypoint out some future work.

2 Background

2.1 XML Data Model

We employ an XML model in which documents andfragments of documents are represented as ordered,node-labeled trees. We leave out details such as com-ments, processing instructions, references, and the dis-tinction between elements and attributes.

Assume a set E of element names and a set T of textstring values disjoint from E. Given a set X, let L(X)

denote the set of all lists that can be built over elementsfrom X. Then an XML document fragment F is a 4-tuple (V, r, label, elem) where

• V is a set of vertices with a distinguished elementr, called the root node,

• label is a mapping from vertices to element labels,i.e., label : V → E, and

• elem is a mapping from vertices to their children,i.e., elem : V → L(V ∪ T ).

Let an XML document D be an XML document frag-ment (DF). Furthermore, let F denote the set of alldocument fragments over E and T . Then an XML doc-ument fragment sequence (DFS) S is a sequence of ele-ments of F , i.e., S ∈ L(F). Let Firstk(S) (k ∈ N∪∞)denote the sub-sequence of S consisting of the first kor |S| elements, whichever is smaller.

2.2 XQuery

XQuery [2] combines features from several earlier XMLquery languages, in particular XPath [4]. ThroughXPath, DFs can be extracted from an XML document.Nested loops iterate over these fragments to furtherextract DFs and construct sequences of output DFs.Variable assignment supports complex computationsbased on content and structure of the input.

Details of XQuery’s formal semantics can be found in[7] and are not discussed here. For our approach, itis sufficient to note that in- and output to XQueryqueries are always DFSs, i.e., a query q is a mappingq : L(F)→ L(F). We illustrate the syntax and seman-tics of XQuery by means of a few simple examples. Wewill extend these examples in Section 3 to illustrate ournew operator.

Example 2.1 Select paragraphs of articles dating backto Feb. 15th, 2002 from the news document database.document(‘‘news.xml’’)

//article[./date=’’2002-02-15’’]//paragraph

The example consists of a single XPath expression,which alone is a valid XQuery query.

Example 2.2 List all articles that appeared before1996 with their first author and title, in sorted order.FOR $a IN document(”bib.xml”)//articleWHERE $a/year < 1996RETURN<early paper>

<fstAuth> {$a/authors/author[1]/text()} </fstAuth>{$a/title}

</early paper>SORTBY (author[1], title DESCENDING)

Example 2.2 consists of a single loop that goes througharticles and extracts the author and title. The completetitle DF builds a part of the result fragments. A newtag name fstAuth is introduced for the author. Thereturned fragments are ordered by author and title.

Example 2.3 Convert a list of news articles classifiedunder a certain category to a list of categories with theirrelated articles.<news by category>{FOR $c IN document(‘‘newsmeta.xml’’)//category

RETURN<category><name> {$c/name/text()} </name>{FOR $a IN document(‘‘news.xml’’)

//article[@cid = $c/@id]RETURN<title aid={$a/@id}> $a/title/text() </title>

</category> }</news by category>

Example 2.3 represents a join between category andarticle DFs based on a news category identifier. Thejoin is implemented as two nested loops.

2.3 Document Retrieval

Document retrieval or, more commonly used, informa-tion retrieval (IR) is concerned with ordering docu-ments from a document collection by relevance to aquery [12]. The query is usually simple and consistsof just a few terms. Relevance is based on term dis-tribution statistics. A numerical weight is assigned toeach term occurrence in each document. The weightrepresents the term’s significance within the documentcontent. The relevance ordering is obtained by sum-ming up query term weights for every document anddetermining the highest sum. A second flavor of IRis only concerned with partitioning the collection intorelevant and non-relevant documents. It can be im-plemented by means of the ordering approach and theapplication of a threshold below which documents areregarded as not relevant.

A standard approach for weighting terms is theterm frequency-inverse document frequency (tf-idf) ap-proach [12]. Tf-idf assigns higher weights to term oc-currences with high in-document and low overall docu-ment frequencies, i.e., few documents that contain theterm.

Term distributions may vary widely throughout a doc-ument collection. Hence, the significance of a term oc-currence and thus, its relevance, highly depends on thecontext. In most existing retrieval approaches, the con-text is always the complete document collection.

3 XQuery Rank Operator

In traditional IR, the result of a usually stand-alonequery is a total order or partitioning of documents thatare the single unit within a collection. The result isdirectly presented to the user. When integrating IRinto an XML query language, this raises two majorquestions:

1. What is the appropriate equivalent for documentsand document collections within a single database-like XML document source (or set of such)?

2. How and where can IR be of use within a query?In particular, can it make sense to consider IR notonly as a final operation, but one that does some-thing meaningful in an intermediate query step?

XML queries extract document fragments (DFs) fromXML sources. DFs are the only data unit suitable toreplace the notion of document from standard IR. InXQuery in particular, all results are sequences of DFs(DFSs). It is only natural to modify the order of DFswithin a DFS by means of an ordering IR approach,very similar to the sort operator in XQuery. This canbe accomplished by a single operator. DFSs replace thenotion of document collections. The order is transpar-ent to subsequent queries that do not rely on any order,but can be exploited by other sub-queries or shown asan end result. The relevance-based partitioning canthen be implemented on top of the ordering IR as dis-cussed in Section 2.3.

The above observations let us derive the following re-quirements for integrating an IR operator into an XMLquery language, in particular XQuery.

Total order. The operator should be able to order asequence of DFs based on relevance.

Local context. The partitioning flavor of IR shouldbe supported to establish a local context for sub-sequent queries.

Closure. The operator should be closed withinXQuery and thus, be applicable within arbitraryXQuery expressions.

Transparency. The operator should not affect queriesin ways other than changing the DFS-internal or-der or eliminating elements of the DFS.

Exchangeability. The IR weighting algorithm under-lying the operator should be exchangeable. A usershould have means to choose the weighting algo-rithm for a query.

There is an additional requirement we impose, becauseit appears to make the IR-style ranking within the hostlanguage even more useful:

Visibility. The operator should assign visible rankingweights to DFs, but without causing side-effects tothe embedding query.

The exchangeability property of the operator allows fordifferent IR algorithms to be plugged into the query en-gine. Although we have not introduced any limitationshere, we identify a certain class of algorithms calleddynamic ranking algorithms as required to establish areal local context for sub-queries. These algorithms willbe subject of the next section.

In the following, we introduce the syntax and semanticsof an operator that meets the above requirements. Theelegant simplicity of the XQuery extension should allowto easily understand the operator’s functionality, eventhough space limitations prevent us from going into allthe details.

3.1 Syntax

We propose to add a single new operator called rankto the XQuery language. The operator is used within aRankby expression that extends the set of base XQueryexpressions. A Rankby expression is very similar toXQuery’s Sortby expression [2, Section 2.4].

Definition 3.1 (Rankby expression)

RankBy::=Expr ”rankby” ”(”QuerySpecList”)”(”ascending” | ”descending”)?[”basedon” ”(” TargetSpecList ”)”][“limit” n [“%”]][”using” MethodFunctCall]

QuerySpecList ::=Expr (”,” QuerySpecList)?TargetSpecList::=PathExpr (”,” TargetSpecList)?

QuerySpecList is a list of strings (constant DFs) orexpressions that return DFs that can be interpretedas strings in XQuery as well. TargetSpecList is a listof context node-dependent path expressions. Method-FunctCall refers to an XQuery function. 2

The syntax and semantics of XQuery functions(FunctionCall in the specification) are not yet fully spec-ified by the W3C. MethodFuntCall is more of symbolicnature. It represents an IR algorithm that may or maynot be implemented as some kind of stored procedure.

3.2 Semantics

Assume the set L(F) of document fragments (Section2.1). Let weight be a special element name in E.

Definition 3.2 (Weighting algorithm)Assume a set of IR queries Q = L(F) which includesthe DFs consisting of only a string from T , in particularsingle terms. Let W be the set of operators

W : L(F)×Q → L(F),W (S,Q) = S′

such that S′ is equal to S except that each DF in S′

has an additional element named weight at the root.The weight element has a single number string as childrepresenting the relative relevance weight of the respec-tive DF within S. Then a W ∈ W is called a weightingalgorithm. 2

Within the Rankby expression, the MethodFunctCallpart represents a certain exchangeable weighting al-gorithm. Related to the QuerySpecList, queries areDFs that can be interpreted as query text. The ac-tual query interpretation and limitations to the al-lowed query types depend on the weighting algorithmW . In particular, typical IR term queries like (“sun”,“moon”) can be regarded as DFs with only a singletext string node. More advanced weighting algorithmsmay make use of the structure of query DFs. In the fol-lowing, we define the actual ranking operator in termsof a weighting algorithm and XQuery’s existing sortoperator.

Definition 3.3 (Rank operator)The rank operator is an operator:

rank : L(F)×N×Q×W −→ L(F)rank(S, k,Q,W ) = Firstk(sortweight(W (S,Q)),

where N is the set of natural numbers and Q definedas above. sortweight refers to the sorting of DFs withinS based on the weight element. Firstk eliminates allbut the first k elements from S (First∞(S) = S). 2

If there is a non-empty TargetSpecList in the Rankbyexpression, for each DF in S, the input to rank are the

fragments selected by the TargetSpecList’s path expres-sion, otherwise the complete DFs. The elimination ofall but the first k DFs relates to the limit clause of theRankby expression. As an extension, the limit couldalso be based on a certain percentage of the DFS’s to-tal weight to be kept after the elimination.

The weighting algorithm W , exchangeable withinrank, implements the total order as called for in therequirements. It adds a weight element to a DF mak-ing the ranking visible, a further requirement. Notethat in XML, we envision an attribute to be added.An attribute is much more transparent to the rest ofa query, but not an explicit part of our simplified datamodel. Full transparency of the ranking, however, canonly be achieved through introducing a reserved namefor weight in XQuery. XQuery’s closure properties arekept as DFSs remain the only type of in- and output.

We call the XQuery query language extended by therank operator XQuery/IR. Due to space limitationswe have to illustrate the expressiveness of XQuery/IRby means of the following examples in favor of a morecomplete discussion.

3.3 Examples

Example 2.1 (simplified) can be extended to retrieve amaximum of 100 paragraphs with relevant informationabout New York as follows:document(‘‘news.xml’’)//article//paragraphrankby (‘‘New York’’) limit 100

The result might look like:<paragraph weight=0.96>

The New Yorker fire fighters. . .</paragraph><paragraph weight=0.81>

Weekend weather in New York promisses to. . .</paragraph><paragraph weight=0.79>

. . .

In Example 2.2, instead of sorting the result by author,a relevance-based ranking only based on the articles’abstracts can be obtained through:FOR $a IN document(”bib.xml”)//articleWHERE $a/year < 1996RETURN<early paper><fstAuth> {$a/authors/author[1]/text()} </fstAuth>{$a/title}

</early paper>RANKBY (‘‘Albert’’, ‘‘Einstein’’) BASEDON (abstract)

Note that abstract as argument to BASEDON is anXPath expression relative to the context node $a. The

terms “Albert” and “Einstein” could be replaced by asubquery that selects text to be used as query fromanother document like:document(‘‘authors.xml’’)//author[./name=’’Einstein’’]

/accomplishments

A modification to the earlier Example 2.3 extracts onlythe first 10 articles in each category. Here, the ran-<news by category>{FOR $c IN document(‘‘newsmeta.xml’’)//category

RETURN<category><name> {$c/name/text()} </name>{FOR $a IN document(‘‘news.xml’’)

//article[@cid = $c/@id]RETURN<title aid={$a/@id}> $a/title/text() </title>

RANKBY ($c/keywords) LIMIT 10</category> }

</news by category>

king occurs in the inner loop based on the actual rele-vance for the category an article is classified under.

4 Implementation Aspects

Weighting algorithms underlying the new operatorhave some novel properties. Furthermore, rank hassome implications for a system implementation, in par-ticular index structures and query optimization. In thissection, we briefly address both issues.

4.1 Dynamic Ranking

In standard IR, queries are always evaluated in thecontext of the complete document collection, a notionthat we have replaced by the concept of DFS. DFSsare not static anymore but dynamically established bya (sub-)query. Nothing prevents us from still usingstatic, pre-computed term weights. However, most ofthe power of the integrated query approach with itslocal contexts is lost in that case.

For instance, it is unsatisfactory to always use thesame, fixed inverse document frequency for terms whenusing an adapted tf-idf scheme, because the sub-queryresult to which the weighting is applied might not evenonce contain the respective term. This will lead tounexpected and even useless ranking results. Conse-quently, more general term counting statistics have tobe kept. For each ranking operation, term statisticsfor the current intermediate result have to be derived.Then, a scheme like, e.g., tf-idf could be used as before,or an extension thereof that utilizes the additional in-formation encoded in the structure of DFs.

We call this general principle underlying IR embeddedinto a data retrieval query language dynamic rankingprinciple. An approach following this principle allowsus to detect the local significance of an otherwise rathercommon term, considered as irrelevant in a more globalcontext.

4.2 Indexes and Query Optimization

For an IR index to be useful for hierarchically struc-tured XML data, it needs to capture term occurrenceswithin single document fragments. This is obviouslymore expensive than a static index on a collection ofdocuments as single units. However, by definition onlyleaf nodes contain content. Thus, storing term statis-tics for these and relying on an index for the documentstructure as required for structure-based queries any-way can be sufficient to derive statistics for arbitraryDFs.

Furthermore, the query engine needs to be extendedto keep track of where parts of DFs within an inter-mediate result originate from. This is made even moredifficult as DFs from different contexts may be put to-gether through XQuery’s construction operations. Insome cases, e.g., in case of aggregated data, no directrelationships between DFs in a DFS and their origincan be maintained. In this case, either ranking is notpossible or term statistics have to be collected fromscratch, which might be acceptable for small results.

The expenses associated with a full text index on top ofXML data can be compensated by an IR index-awarequery engine, an advantage that is ignored when de-ploying IR outside of a database. For semistructureddata without a lot of schema information, optimizationvia the IR index is especially promising [10]. Not onlypoint term indices directly to the data, but detaileddata statistics, e.g., about the variance of certain ele-ment values are easily integrated into an IR index.

5 Conclusions and Future Work

We have introduced a new operator into XQuery thatnaturally extends XML queries by information retrievalcapabilities. XML document fragment sequences areintermediate and end results in XQuery. We have iden-tified arbitrary document fragments within such dy-namically selected sequences as suitable replacementfor the notions of document (collection) in today’s in-formation retrieval. The extended language, which wehave dubbed XQuery/IR, is conceptually simple yetmore expressive than the sum of its parts. We haveidentified dynamic ranking, the usage of term statis-

tics in the local context established by a query, as themost important property of an underlying IR algorithmto achieve this expressiveness.

Currently, we are implementing a system able todemonstrate important properties of the presented ap-proach. A hindrance is the non-existence of freelyavailable query engines and large, deeply nested XMLdatabases with diverse types of textual information.

Other aspects we are investigating include reformula-tion and thus, optimization rules for queries involvingthe rank operator. Questions are, for instance, to whatextend ranking and data extraction are commutative.

References

[1] O. Alonso. Oracle Text White Paper. Technical report,Oracle Corp., Redwood Shores, U.S.A, May 2001.

[2] D. Chamberlin, J. Clark, D. Florescu, J. Robie,J. Simeon, M. Stefanescu. XQuery 1.0: An XML QueryLanguage. W3C Working Draft, W3C, June 2001.

[3] T.T. Chinenyanga and N. Kushmerick. An Expres-sive and Efficient Language for XML Inf. Retrieval. InProc. of the 2001 SIGIR Conf., pp. 163-171, 2001.

[4] J. Clark and S. DeRose. XML Path Language (XPath).W3C Recommendation, W3C, Nov. 1999.

[5] M. Fernandez and J. Marsh. XQuery 1.0 and XPath2.0 Data Model. W3C Work. Draft, W3C, 2001.

[6] D. Florescu, D. Kossmann, and I. Manolescu. Inte-grating Keyword Search into XML Query Processing.In Proceedings of the 9th Int’l Word Wide Web Con-ference/Computer Networks, 33(1-6):119-135, 2000.

[7] P. Frankhauser, M. Fernandez, A. Malhotra, M. Rys,J. Simeon, and P. Wadler. XQuery 1.0 Formal Seman-tics. W3C Working Draft, W3C, June 2001.

[8] N. Fuhr and K. Grossjohann. XIRQL: A Query Lan-guage for Information Retrieval in XML Documents.In Proc. of the SIGIR 2001 Conf., pp. 172-180, 2001.

[9] M. Kaszkiel, J. Zobel, and R. Sacks-Davis. Effi-cient Passage Ranking for Document Databases. ACMTransactions on Inf. Systems, 17(4):406–439, 1999.

[10] J. McHugh, J. Widom, S. Abiteboul, Q. Luo, andA. Rajaraman. Indexing Semistructured Data. Tech-nical report, Stanford University, February 1998.

[11] G. Navarro and R. Baeza-Yates. Proximal Nodes: AModel to Query Document Databases by Content andStructure. ACM Transactions on Information Sys-tems, 15(4):400-435, 1997.

[12] G. Salton and M. J. McGill. Introduction to ModernInformation Retrieval. McGraw-Hill, 1983.

[13] A. Theobald and G. Weikum. Adding Relevance toXML. In Proc. of the 3rd Int’l Workshop on the Weband Datab., LNCS 1997, pp. 105-124, Springer, 2001.

Constraint Preserving XML Storage in Relations

Yi Chen, Susan B. Davidson and Yifeng ZhengDept. of Computer and Information Science

University of Pennsylvania

[email protected], [email protected], [email protected]

Abstract

As XML becomes a standard for data represen-tation on the internet, there is a growing interestin storing XML using relational database technol-ogy. To date, none of these techniques have con-sidered the semantics of XML as expressed bykeys and foreign keys. In this paper, we presenta storage mapping which preserves not only thecontent and structure of XML data, but also itssemantics.

1 IntroductionOver the past several years, there has been a tremen-dous interest in using relational databases to store XMLdocuments[1, 2, 3, 4, 5], thus leveraging a well-developedtechnology for data management and query processing.There have also recently been several proposals for cap-turing constraints beyond DTDs, in particular keys [6] andforeign keys. Aspects of these proposals are finding theirway into XMLSchema [7] where, following the older no-tions of ID and IDREF, foreign keys are termed “keyrefs”.The question naturally arises as to how to capture XMLkey and keyref constraints in the relational mapping.

For example, consider a community of biomedial re-searchers who are performing gene expression experi-ments and exchange data using the XML standard MAGE-ML [8]. This standard includes a specification of keys andkeyrefs, and each group is expected to produce data that iscorrect with respect to these keys. Upon the recommen-dation of their bioinformatic experts, they use relationaldatabase technology to store their experimental data (e.g.the Stanford Microarray Database [9], which uses Oracle).Since the data is already stored in a relational database,they wish to ensure that the data in relational form is cor-rect with respect to the XML keys using relational technol-ogy.

Key and foreign key constraints can always be ex-pressed as queries which produce an empty result whenevaluated against a correct instance and a non-empty resultotherwise. Thus XML constraints can always be checked

using triggered procedures in a relational database. How-ever, triggers are very inefficient compared to key and for-eign key constraints in relational databases [10]. To fullyleverage database technology for constraint checking, wetherefore wish to map XML key and keyref constraints torelational key and foreign key constraints.

In this paper, we address the problem of mapping anXML document together with its constraints into a rela-tional schema so as to check XML key and keyref con-straints using key and foreign key constraints. The map-ping should preserve three kind of information: 1) thecontentof the document, i.e. each node of the documentshould appear in the target database; 2) thestructureof thedocument, i.e. the parent-child relationship of the nodes;1

and 3) thesemanticsof the document as captured by XMLkey and keyref constraints. We will call such a mapping aninformation preserving mapping, and focus on mappingsin which the schema is available.

This work makes the following contributions:

1. We distinguish three categories of information whichneed to be preserved in an information lossless map-ping.

2. We propose a notion ofconstraint relationswhich ex-plicitly capture XML key and keyref constraints.

3. We present a mapping from an XML document to arelational instance which extends constraint relationsto capture the complete content and structure of thedocument.

4. XML key/keyref constraints can be checked using keyand foreign key constraints in the constraint relations.

The rest of the paper is organized as follows: A defini-tion of XML keys and keyrefs is given in Section 2. Afterintroducing constraint relations in Section 3, we give aninformation preserving storage mapping algorithm calledX2R. We close by reviewing related work and discussingfuture directions.

1Note that we are not considering other structure information such ascardinality constraints.

<root><city><name> Philadelphia </name><state> PA </state>

<restaurants><cuisine type=‘‘French’’>

<restaurant><name> Le Soir </name><entree>

<name>Braised Cod Loin and Squid</name><price>$24</price>

</entree><dessert>

<name>Apple French Toast</name><price>$10</price>

</dessert> ...</restaurant> ...

</cuisine> ...</restaurants><reviews>

<review restaurant="Le Soir">Excellent

<review> ...</reviews>

</city> ...</root>

Figure 1: Sample XML document

2 Constraints: Keys and keyrefs

Before giving a formal definition of keys for XML, con-sider the sample document “restaurants.xml” shown inFigure 1. The document contains information about restau-rants and reviews of restaurants by cities. Within a city,the restaurants are grouped by their cuisine. As for thesemantics of the document, one might wish to assert thata city is identified by its name and state, and that withinthe context of a city a restaurant is uniquely identified byits name. For example, since there is already a restaurantnamedLe Soir in Philadelphia PA, we could not add an-other with the same name in Philadelphia PA; however,we could add another restaurant namedLe Soir in SeattleWA. We might also wish to assert that each menu item in arestaurant (which can be either an appetizer, entree, salador dessert) is uniquely identified by its name. For example,since there is an dessert namedApple French Toastat therestaurantLe Soir, there can not also be an entree with thatname. Finally, we might wish to assert that each reviewrefers to a restaurant by name, i.e. the value of the restau-rant attribute of a review is the name of some restaurant.

To define a key for XML we specify three things: 1) thecontext in which the key must hold; 2) a set on which weare defining a key; and 3) the values which distinguish eachelement of the set. Since we are working with hierarchicaldata, specifying the context, set and values involve path

expressions. In what follows, we adopt the syntax of [6]for keys2 and use the following notation:n[[P ]] denotesthe set of node identifiers in the XML tree representationof the document that can be reached by following the pathexpressionP from the node with identifiern. We also use[[P ]] as an abbreviation forr[[P ]], wherer is the root node.

An XML keycan be written as

K : (CK ; (TK ; fPK1; : : : ; PK

p g))

whereK is the name of the key, path expressionsCK

is called the context path, TK the target path, andPK1; :::; PK

p thekey pathsfor the key. For acontext nodec 2 [[CK ]], relative to atarget nodet 2 c[[TK ]], key pathPKi (i = 1; :::; p) must identify a singlekey node(either

an element or attribute) whose value must be of a simpletype. That is, following XMLSchema we require that eachPKi exist and be unique (strong keys, [6]); furthermore,

the value at the end of a key path must be of a simple type(rather than an XML tree value as in [6]). The idea is thatwithin the scope of a context node, the key constraint musthold on the set of target nodes.

Definition 2.1: An XML treeT is said tosatisfya keyK :(CK ; (TK ; fPK

1; : : : ; PK

p g)) if and only if 8c 2 [[CK ]],8t1; t2 2 c[[TK ]], t1[[PK

i ]] = t2[[PKi ]](i = 1; : : : ; p) !

t1 = t2.

In relational databases, aforeign keyallows a list of at-tributes in one table to reference a list of attributes in an-other table (which must form the key for the referred table).Since XML is hierarchical, we must additionally specifythe context in which the references are made. Our notationfor a keyref is therefore similar to that of a key:

R : (CR; (TR; fPR1; : : : ; PR

p g)) KEYREFK

whereK is the name of the key being referenced,R isthe name of the keyref,CR is the context path,TR isthe target path, andPR

1; :::; PR

p are thekey pathsfor thekeyref. The concatenation of the context and target pathslocates the referencing node. As in the relational case,each key pathPR

j must match the key pathPKj of keyK,

j = 1; :::; p, i.e., although the key path expressions maybe different they must have compatible data types. Fol-lowing XMLSchema, the referencing node and referencednode must be within the same context node, thus the pathexpressionCR must be the same asCK .3

Definition 2.2: An XML tree T is said tosatisfya keyrefR : (C; (TR; fPR

1; : : : ; PR

p g)) KEYREFK,if and only if8c 2 [[C]], for 8tR 2 c[[TR]], 9tK 2 c[[TK ]], tR[[PR

i ]] =tK [[PK

i ]](i = 1; : : : ; p).

Adopting XPath notation for paths, let “/” denote theroot or be used to concatenate two path expressions, “.”

2We adopt this because its syntax is more concise than that ofXMLSchema.

3To be precise,CR should be equivalent toCK when this can beefficiently decided.

denote the current context, “//” match a sequence of labels,and@ be an attribute name. All paths must end in either asingle label or a disjunction of labels. The keys and keyrefsfor our sample XML document can be written as:

K0 : (=; (:==city; f:=name; :=stateg))A city is identified by its name and state.

K1 : (==city; (:==restaurant; f:=nameg))Within the context of a city, each restaurant isuniquely identified by its name.

K2 : (==restaurant;(:=appetizerjentreejsaladjdessert; f:=nameg))Within a restaurant, an appetizer, entree, salad ordessert is identified by its name.

R0 : ((==city; (:=reviews=review;f:=@restaurantg)) KEYREFK1

Within a city, each review refers to a restaurant in thatcity by its name.

Before moving on to storage mapping, we comment onthe restriction that the context node of the node being ref-erenced must be the same node as the context node of thereferencing node. Although this is done in XMLSchemato simplify the referencing scheme, it is not necessary. In[6], the authors defined a notion of “transitive keys”, whichguarantees that the context node of any key can itself beidentified by some key value. In this case, the target nodeof any key can be identified by some path of key valuesfrom the root. Using this, the notion of foreign key can begeneralized so that the context nodes may differ.

3 An Information Preserving MappingWe now present an algorithm which, given the schema ofan XML document, generates an information preservingrelational schema. The XML schema consists of informa-tion about the structure of the document (i.e. the DTD) andXML key and keyref information as described in the pre-vious section. The relational schema generated consists ofa collection of relations together with their key and foreignkey constraints.

3.1 Constraint preservation via constraint relations

At the heart of the schema-generation technique is a set ofrelations corresponding to the given XML key and keyrefconstraints. To capture the internal identifier for a nodenin the document, we use the notationnodeid(n). We usetext() to grab the value of an attribute or simple element(text) node.

For each XML keyK : (CK ; (TK ; fPK1 ; : : : ; PK

p g)),we create akey relationKR(tid; cid; P1; : : : ; Pp), wherefor everyc 2 [[CK ]] and t 2 c[[TK ]], a tuple (nodeid(t),nodeid(c), t=PK

1 :text(); :::; t=PKp :text()) is inserted into

KR. We also assert the following functional dependency(key) inKR: cid; P1; : : : ; Pp ! tid. Each target node isin a single context and hence occurs exactly once inKR.

There are therefore two keys forKR, (cid; P1; : : : ; Pp)and (tid).

Similarly, for each XMLkeyrefR : (CR; (TR; fPR

1; : : : ; PR

p g)) KEYREF K wecreate akeyref relationRR : (tid; cid; P1; : : : ; Pp), wherefor everyc 2 [[CR]] and t 2 c[[TR]], a tuple (nodeid(t),nodeid(c), t=PK

1:text(); :::; t=PK

p :text()) is inserted intoRR. We also assert the foreign key(cid; P1; : : : ; Pp) REF-ERENCESKR(cid; P1; : : : ; Pp). As with the key rela-tion, (tid) is a key forRR. Both the key and key referencemappings rely crucially on the fact that each key path ter-minates in an attribute or simple element (text). We willrefer to these key and keyref relations asconstraint rela-tions.

Note that since we insert a tuple inKR (RR) for everytarget node of every context node, the mapping from theXML document to these constraint relations is complete.

Proposition 3.1: An XML document satisfies the XMLkey K : (CK ; (TK ; fPK

1; : : : ; PK

p g)) if and only if thecorresponding key relationKR(tid; cid; P1; : : : ; Pp) sat-isfies its key(cid; P1; : : : ; Pp).Sketch of proof: By construction, there is a one to onecorrespondence between tuples inKR and matches for thecontext node, target node, and key path values ofK in theXML document. Since the functional dependency (key) inKR is equivalent to the following assertion:8t1; t2 2 KR;t1:cid = t2:cid ^ t1:P1 = t2:P1 ^ : : : ^ t1:Pp = t2:Ppa violation of the relational key implies a violation of theXML key and vice versa.

Proposition 3.2: An XML document satisfies the XMLkeyrefR : (CR; (TR; fPR

1; : : : ; PR

p g)) KEYREFKif and only if the corresponding foreign key rela-tion RR(tid; cid; P1; : : : ; Pp) satisfies its foreign key(cid; P1; : : : ; Pp) REFERENCESKR(cid; P1; : : : ; Pp).Sketch of proof: By construction, there is a one to onecorrespondence between tuples inRR and matches for thecontext node, target node, and key path values ofR inthe XML document. The foreign key constraint inRR isequivalent to the following assertion:8t1 2 RR; 9t2 2 KR: t1:cid = t2:cid ^ t1:P1 =t2:P1 ^ : : : ^ t1:Pp = t2:PpwhereKR is the key relation formed from the XML keyK. Since the referring and referred nodes are by definitionwithin the same context,t1:cid = t2:cid is always true,and a violation of the relational foreign key constraint im-plies a violation of the XML key reference constraint andvice versa.

It is therefore enough to check key and foreign key con-straints in the relational instance to guarantee that the keyand keyref constraints are satisfied in the source XML.

Returning to our example of the previous section, theconstraint relations to be created are:

1. city(cid, name, state)for K0, with (name,state) and(cid) as keys.

The attributecid stores the identifier of a node taggedwith city, and attributesnameandstatestore the valueof its nameand statechild elements, respectively.Since the context node of the key is the root of thedocument it is omitted in the relation.

2. restaurant(rid,cid,name)is created forK1, with (cid,name) and (rid) as keys.

3. menuitem(iid,rid, ajejsjd.name, type)is created forK2, with keys (rid, ajejsjd.name) and (iid). Attributetypecan have value “appetizer”, “entree”, “salad” or“dessert”.

4. review(rvid, cid, @restaurant)is created forR1, withkey (rvid) and foreign key (cid, @restaurant) REFER-ENCESrestaurant(cid,name).

Before we leave the issue of constraint preservation, itis important to consider the effect of updates to the sourceXML document. Following [12, 10], we assume two formsof updates,insert(content)anddelete(child).4 We also as-sume that the path from the root to the update point is eitherspecified or recoverable (e.g., using a parent relation).

Thus an insertion can be considered as an XML docu-ment with the same root as the original document in whichthe content to be inserted is marked as “new” and the ex-isting content (i.e. the path from the root to the updatepoint) is marked “old”. Using the mapping to constraintrelations described earlier, a tuple produced from a nodemarked “new” causes an insertion. Tuples produced fromnodes marked “old” are ignored.

Assuming that the content of the subtree rooted atchildis either specified or recoverable (for example, througha content and structure preserving mapping of the docu-ment), an analogous technique can be used for deletion: atuple produced from a node marked “new” causes a dele-tion, and tuples produced from nodes marked “old” are ig-nored.

In this way, every insertion (deletion) can be mappedto a set of inserts (deletes) to the constraint relations viathe mappings described earlier to populate the constraintrelations. Note that since a single XML update may corre-spond to several tuples in the target, transactions must beused to prevent anomalies.

Let Æ be the content to be inserted or the contents ofthe subtree to be deleted, “+” be the effect of inserting ordeleting content,� be the XML constraints,�0 be the keyand foreign key constraints on the constraint relations, andM be the mapping which populates the constraint relationsfrom the XML input.

Proposition 3.3: I + Æ satisfies� if and only if M(I) +M(Æ) satisfies�0.Sketch of proof: According to Propositions 3.1 and 3.2,I + Æ satisfies� if and only if M(I + Æ) satisfies�0. Bythe definition ofM , M(I + Æ) = M(I) +M(Æ).

4We do not consider the operators rename and replace since the au-thors acknowledge that they have problems on REF.

1. Create constraint relations. If any target path ends with adisjunction of tags, then add a new attribute in the relationto record the tag of the target node.

2. Create a DTD graph that represents the structure of a givenXML-Schema.

3. For each XML key or key reference constraint, mark theedges at the end of the target path in the schema graph.

4. If a target path ends with a single tag (as opposed to a dis-junction), then inline any attribute or non-empty text de-scendent connected by non-star path as well as the contentof the target node (if it exists).

5. If any target path ends with a disjunction tag, inline thecommon attributes or non-empty text descendents con-nected by a non-star path as well as the content of the targetnode (if it exists).

6. If all incoming edges of a node are marked, the non-staredge connecting this node and the inlined node(includingthat on the key path) are marked. Do this recursively untilno more edges need to be marked.

7. If there are unmarked non-star edges in the schema graph,find one whose source vertex connects with its parent bya star edge. We name this node amaster node. Create atable for the master node and inline any descendant nodesconnected by a non-star path. Mark all incoming edges ofthis master node.

8. Repeat the last two steps until there are no unmarked non-star edge in the schema graph.

9. Ensure that all parent-child relations are recorded by addinga parent id (and parent code) in each child relation to linkwith its parent.

Figure 2: TheX2R storage mapping algorithm

3.2 Content and Structure Preservation

Since only some of the nodes in the source are involved ina key or key reference relation, we need to consider howthe rest of the document can be captured to preserve con-tent, structure and semantic information. We can use ar-bitrary XML storage system together with the constraintrelations to preserve them, but this approach brings redun-dancy in the generated relational schema. In the remainderof this section, we present an algorithm calledX2R (seeFigure 2) which generates an information preserving map-ping without redundancy.

The algorithm extends an inlining strategy [4] as fol-lows: First, we create a DTD graph(as defined in [4]) aswell as the constraint relations defined in the previous sub-section. We then map the nodes not captured in the con-straint mapping, either by inlining them into the constraintrelations or by creating seperate relations. Finally, we addany missing parent-child information in the constructed re-lations.

The salient differences between theX2R algorithm andhybrid-inlining and its variants [4, 5] are as follows: 1) We

name price

@spicy

root

city

reviews

review

@restaurant

entree

@type

cuisine

restau

restaurant

appetisal

sert

state

zer ad

rants

2

3

45

6

87

911 13

12

14

18

17 1921

22

23

24

15

1

10

16

20

des

*

*

*

*

* * *

*

Figure 3: DTD graphstart from a set of key and reference relations which cap-ture semantic information; 2) Relations that are separatedin hybrid inlining may be coalesced by paths that end witha disjunction; and 3) The key and reference relations mayinline ancestors other than the parent, i.e. the context nodemay be a non-parent ancestor.

Proposition 3.4: The X2R mapping from the sourceXML schemaX toR is information lossless.Sketch of proof: Every node in the source XML docu-ment is mapped into some relation (content preservation),the structure information is captured either explicitly inpid attributes or via the DTD (inlining), and the seman-tics is captured in the constraint relations (proposition 3.1and 3.2).

3.3 An example

We now illustrate how theX2R mapping algorithm workson our example. The DTD graph assumed is presented inFigure 3 (see the Appendix for the XML schema specifi-cation) and the final relations created are as follows:

� city(cid, name, state)with keys (name, state) and(cid).

� cuisine(csid, type, pid)with key (csid) and foreignkey (pid) REFERENCEScity(cid).

� restaurant(rid, cid, name, pid)with keys (cid, name)and (rid), and foreign key (pid) REFERENCEScui-sine(csid).

� menuitem(iid,rid, ajejsjd.name, price, type)with keys(rid, ajejsjd.name) and (iid), and foreign key (rid)REFERENCESrestaurant(rid).

� spicy(eid, spicy)with key (eid) and foreign key (eid)REFERENCESmenuitem(iid).

� review(rvid, cid, @restaurant, TEXT, pid)with key(rvid), and foreign keys (cid, @restaurant) REF-ERENCESrestaurant(cid, name)and (pid) REFER-ENCEScity(cid).

In step 1, we build constraint relationscity(cid,name, state), restaurant(rid, cid, name), menuitem(iid,rid,ajejsjd.name,type), andreview(rvid, cid, @restaurant). Instep 4 we inline theTEXT information into thereviewre-lation, and in step 5 we inline theprice information intotablemenuitem. Relationcuisine (csid,type, pid)and re-lation spicy(eid, spicy)are created in step 7. All thepidinformation is inlined in step 9. Note that in some rela-tions (e.g. tablemenuitem) the parent node coincides withthe context node. Details of this construction are deferredto the Appendix.

4 Related works and conclusions

This paper proposes a novel XML storage strategy usingrelational databases which preserves the content, structureand semantic information as expressed in key and foreignkey constraints. In contrast, other strategies that have beenproposed [1, 2, 3, 4, 5] only preserve the content and struc-ture of XML documents. Our storage mapping is guidedby the XML key and keyref constraints, and is based on thenotion of constraint relations. Although the storage strat-egy in this paper is integrated with hybrid-inlining, con-straint relations can be combined with virtually any otherstrategy. A direct benefit of our storage mapping is theability to efficiently check XML constraints using rela-tional key and foreign key constraints; furthermore, thetechnique is incremental. In some ways, our approach isanalogous to the initial stage of database design based onfunctional dependencies. In future work, we plan to con-sider how the conceptual schema design can be refined ac-cording to the query workload.

LegoDB [5] considers the issue of finding an optimal re-lational schema in a space of storage mappings accordingto the query workload for some cost model. The procedureis analogous to the tuning procedure for physical databasedesign in the relational database[13]. Specifically, it tunesthe conceptual schema according to the query and updateworkload to achieve better performance. In this paper weproduce a relational mapping which is optimized for up-dates(enforcing the constraints efficiently), but do not con-sider queries. Considering queries would obviously affectthe design: For example, if finding the price and spicinessof an entree is a frequent query we may wish to extendthe schema ofmenuitemto include attributespicy; the at-tribute is null for appetizers, salads and dessert.

We classify nodes by considering both their tag andtheir keys (if any), while [1, 2, 3, 4, 5] only considerthe former. For example, in our sample XML file,we can define a parent typemenuitemfor types appe-tizer,entree,saladanddessertbased on keys and then storetypemenuiteminto one table. In other words, we use thesemantic information in keys to guide the schema design.

We do not address the issue of mapping queries on theXML document to the relational representation in this pa-per. Several general mapping algorithms from XML querylanguages to SQL [14, 15, 16] have been proposed andcould be used for the X2R mapping.

Another related issue not addressed in this paper is howto map constraints into a relational schema that has alreadybeen created. For example, in the Clio system [17] thetarget schema is fixed. In general, it is impossible for afixed target to preserve all the information in the source,especially the structure and constraints information. [18]begins to address the question of mapping constraints to afixed target schema by giving an algorithm to answer thefollowing question: Given a set of XML keys� and a map-ping to a fixed relational schemaR, is a functional depen-dencyf onR provable from�? [19, 20] focuses on themore general question of how to use the constraints in thetarget to optimize constraint checking defined in source.

There are several native XML Schema validators[21,10], to our knowledge no other work has been done onautomatically enforcing XML constraints using relationaldatabase technologies.

Acknowledgments.We are grateful to Val Tannen, Wang-Chiew Tan, Byron Choi, Wenfei Fan and Carmem Hara forhelpful discussions.

References[1] A. Deutsch, M. Fernandez, and D. Suciu. Stor-

ing semistructured data with STORED. InIn Pro-ceedings of the Workshop on Query Processing forSemistructured Data and Non-Standard Data For-mats, pages 431–442, 1999.

[2] D. Florescu and D. Kossmann. Storing and queryingXML data using an RDBMS. InBulletin of the Tech-nical Committee on Data Engineering, pages 27–34,September 1999.

[3] A. Schmidt, M. Kersten, M. Windhouwer, andF. Waas. Efficient relational storage and retrieval ofXML documents. InWebDB, pages 47–52, 2000.

[4] J. Shanmugasundaram, K. Tufte, C. Zhang, G. He,D. J. DeWitt, and J. F. Naughton. Relationaldatabases for querying XML documents: Limitationsand opportunities. InThe VLDB Journal, pages 302–314, 1999.

[5] P. Bohannon, J. Freire, P. Roy, and J. Simeon. FromXML-Schema to Relations: A Cost-Based Approachto XML Storage. InICDE, 2002.

[6] P. Buneman, S. Davidson, W. Fan, C. Hara, andW. Tan. Keys for XML. InWWW10, pages 201–210,2001.

[7] H. Thompson, D. Beech, M. Maloney, andN. Mendelsohn. XML Schema Part 0: Primer , May2001. http://www.w3.org/TR/xmlschema-0/.

[8] MAGE-ML.http://www.mged.org/Workgroups/MAGE/mage-ml.html.

[9] G. Sherlock and et al. T. Hernandez-Boussard.The Stanford Microarray Database. InNu-cleic Acids Research, 29(1):152–155, 2001., 2001.http://daisy.stanford.edu/MicroArray/SMD.

[10] Y.Chen, S. Davidson, and Y. Zheng. Validating Con-straints in XML. 2002.

[11] R. J. Miller, Y. E. Ioannidis, and R Ramakrishnan.The use of information capacity in schema integra-tion and translation. InProc. 19th InternationalVLDB Conference, pages 120–133, August 1993.

[12] A. Halevy I. Tatarinov, Z. Ives and D. Weld. Updat-ing XML. In Proceedings of ACM SIGMOD Confer-ence on Management of Data, 2001.

[13] J. Gehrke R. Ramakrishnan.Database ManagementSystems. McGraw-Hill, 2000.

[14] J. Shanmugasundaram, E. J. Shekita, J. Kiernan,R. Krishnamurthy, S. Viglas, J. F. Naughton, andI. Tatarinov. A general techniques for querying XMLdocuments using a relational database system.SIG-MOD Record, 30(3):20–26, 2001.

[15] M. F. Fernandez, W. C. Tan, and D. Suciu. Silkroute:trading between relations and XML.WWW9 / Com-puter Networks, 33(1-6):723–745, 2000.

[16] M. J. Carey, J. Kiernan, J. Shanmugasundaram, E. J.Shekita, and S. N. Subramanian. XPERANTO: Mid-dleware for publishing object-relational data as XMLdocuments. InThe VLDB Journal, pages 646–648,2000.

[17] R. J. Miller, M. A. Hernandez, L. M. Haas, L. Yan,C. T. H. Ho, R. Fagin, and L. Popa. The Clioproject: Managing heterogeneity.SIGMOD Record,30(1):78–83, 2001.

[18] S.B. Davidson C. Hara W. Fan. Propagating XMLkeys to relations. Technical Report MS-CIS-01-31,University of Pennsylvania, Computer and Informa-tion Science Department, 2001.

[19] L. Popa and V. Tannen. An Equational Chase forPath-Conjunctive Queries, Constraints, and Views.In International Conference on Database Theory(ICDT), Jerusalem, Israel, January 1999.

[20] L. Popa and A. Deutsch and A. Sahuguet and V. Tan-nen. A Chase Too Far? InProceedings of ACMSIGMOD International Conference on Managementof Data, Dallas, USA, May 2000.

[21] Microsoft XML Parser 4.0(MSXML). Available at:http://msdn.microsoft.com.

Types for Correctness of Queries over Semistructured Data

Dario Colazzo Giorgio Ghelli Paolo Manghi Carlo Sartiani

Dipartimento di Informatica - Universita di Pisa, Corso Italia 40, Pisa, Italy

e-mail: {colazzo, ghelli, manghi, sartiani}@di.unipi.it

Abstract

A type system for a query language should serve bothpurposes of verifying whether a query is coherent withwhat is known about the structure of the database (querycorrectness) and of giving information about the type ofthe query result (result analysis). Current proposals fortyped query languages for semistructured data are usuallyfocused on result analysis, but perform very few controls,or none at all, of query correctness.

This work presents a type system for a core of XQuerythat supports both query correctness and result analysis,and discusses some of the design issues and alternatives.

1 Introduction

In conventional query languages, a type system analysesquery correctness and the result type. Query correctnessis generally defined as a relation of compatibility betweenthe type of the query input and the query type. The querytype represents the type of the data targeted by the query,and is either inferred from the query structure, or directlyprovided by the programmer. Correctness ensures, at thevery minimum, that the query has some hope to find amatch in data that respects the input type.

Result analysis is the process of checking whether aquery effectively returns data of an expected output type.This check is performed by inferring the output type of aquery and by matching it against the expected type.

Both query correctness and result analysis are usefultools for the development of complex database applica-tions, where database queries and high-level program-ming language applications are usually combined forminga complex web of input and output type dependencies.

In the context of semi-structured data (SSD) and XMLonly a few query and manipulation languages exploit statictype information, given the possibly irregular and unstablenature of the data. Current proposals mainly focus onresult analysis [3, 6, 5], but perform very few controls, ornone at all, of query correctness. Actually, there is noclear agreement, and neither much discussion, about whatquery correctness means in this context.

This is due to the fact that SSD, especially XML doc-uments, are usually endowed with rather irregular typedescriptions, comprising union types, recursive types, andwildcards; coherently, the corresponding languages include

operators such as alternative paths, wildcard matching,collection of descendants.

One possible notion of query correctness, adopted by theXDuce language [4], is the full correspondence between thealternative paths in a query and all the possible cases ofthe union-type that describes the input. This approachseems, however, too restrictive for many SSD specific pro-gramming tasks.

At the other extreme, one may flag as non-correct onlythose queries that are statically deemed to always returnan empty collection. This approach has been suggestedby the authors of XQuery [1]. However, unless the systemis able to flag the specific parts of the query where theproblem arises, this policy becomes quite loose, and notinformative enough for programmers.

An intermediate notion of correctness is to deem aswrong all and only those paths in the query that haveno hope to match the input data.

In summary, each of these approaches is reasonable insome specific application, hence none of them can be re-garded as the general purpose solution.

Our contribution This work describes a type systemfor µXQuery, an abstract core of XQuery. µXQuery’stype system provides a formal framework where differentnotions of query correctness can be formalized and com-pared. Specifically, it supports a notion of conformance ofdata to a type, of result analysis, and is based on a three-levels definition of query correctness, according to whicha query can be classified as:

• incorrect, if the structural requirements of the querywill not find a match against any instance of thedatabase schema;

• weakly correct, if the structural requirements of thequery will find a match in at least one instance of thedatabase schema.

• strongly correct, if the structural requirements willfind a match against all possible instances of thedatabase schema.

We believe this characterisation to be particularly suit-able in the context of SSDBs, as it is based on a clearsemantic characterization but is flexible enough to be com-patible with the needs of different applications.

2 Query correctness in XML

query languages

In the absence of input type description, query correctnesscannot be checked. As a consequence, programmers mayinterpret an empty result as being due to a structural re-quirement failure (from clause) or a selection failure (nodata satisfied the where clause).

<!DOCTYPE people[

<!ELEMENT people person+>

<!ELEMENT person (name,phone)>

<!ELEMENT name (frsname, sndname | firstname, secondname)>

<!ELEMENT frname PCDATA>

<!ELEMENT snname PCDATA>

<!ELEMENT firstname PCDATA>

<!ELEMENT secondname PCDATA>

<!ELEMENT phone PCDATA>

]>

Figure 1: A sample DTD.

When schemas are available, instead, some static con-trols can in principle be performed. However, the irregularnature of SSD types and query languages makes this aimvery elusive. In fact, only the language XDuce defines astandard notion of query correctness, but XDuce is quitefar from the standard structure of query languages, beinga stricter relative to functional languages as ML.

Other approaches, such as XQuery’s and Suciu’s ap-proach [6], are not concerned with the automatic iden-tification of incorrect queries and concentrate on resultanalysis. Given a query and a schema for the databaseat hand, the problem is that of checking whether everyoutput of the query conforms to a given expected outputtype.

Such solutions target different kinds of application sce-narios and therefore differ for a number of design choices.However, if we focus on query correctness, XDuce’s ap-proach turns out to be quite restrictive for a query lan-guage, while the other approaches are instead poorly in-formative for the programmer.

XDuce XDuce is a typed, functional, Turing completeprogramming language. It is based on an ML-like patternlanguage that implements a one-match semantics, i.e. ev-ery pattern, instead of collecting every matched piece ofdata (as in standard query languages), only binds the firstmatch. XDuce is nearer to a programming language thanto a query language, but we consider it here since it is theonly example of typed language for XML that explicitlyprovides a notion of type correctness.

For example, consider the following XDuce’s function,which returns the list of phone numbers of all people in thedocument d, which conforms to the schema in Figure 1:

fun selNums: person* --> (sndname,phone)* =

person[name[frsname[String],sndname[n:String]],

phone[p:String]], rest:person*

--> sndname[n], phone[p], selNums(rest) |

person[name[firstname[String],secondname[n:String]],

phone[p:String]], rest:person*

--> sndname[n], phone[p], selNums(rest) |

() --> ()

(XD1)

selNums can be applied to the element people of d. Thisfunction is type correct, because XDuce supports a no-tion of query correctness according to which functions arecorrect if and only if they specify a matching pattern (afunction case) for all possible patterns described by theinput type: exhaustive patterns in function definitions arerequired to ensure the soundness of the type system ofXDuce, as stated in [4]. Indeed, the function,

fun selNums: person* --> (sndname,phone)* =

persons[name[frsname[String],sndname[n:String]],

phone[p:String]], rest:person*

--> sndname[n], phone[p], selNums(rest) |

persons[name[firstname[String],secondname[n:String]],

phone[p:String]], rest:person*

--> sndname[n], phone[p], selNums(rest) |

() --> ()

(XD2)

is statically judged incorrect and never executed, becausethe field persons is not defined in the schema. This no-tion of correctness, however, is too restrictive for XMLquerying purposes. For instance, the function

fun selNums: person* --> (sndname,phone)* =

person[name[frsname[String],sndname[n:String]],

phone[p:String]], rest:person*

--> sndname[n], phone[p], selNums(rest) |

() --> ()

(XD3)

is considered incorrect and never executed, although onewould expect the query to be run, as instances of thedatabase exist that are matched by the body of the func-tion.

XQuery XQuery’s type system infers the output typeof a query by matching the structural requirements of thequery (query type) with the type of the query input [3]. Indoing this, the type system does not identify and discardincorrect queries, but simply assigns an empty type tothose subparts of the query that cannot find a match inany instance of the data. Coherently, a query over aninstance of a union type is assigned an empty type onlywhen none of the members of the union type is relevant tothe query.1 Given the expected type of the query’s outputdata and the inferred output type of the query, the systemcan statically detect if the query’s output value has theexpected output type.

The function (XD1) can be encoded in the followingXQuery’s query,

1This policy is looser than XDuce’s, where a query is accepted

only if it searches for all the members of a union type, and more

suitable for navigating arbitrarily irregular SSDBs.

for $p in d/person,

$n in op:union($p/name/sndname,

$p/name/secondname),

$ph in $p/phone

return <sndname> data($n) </sndname>,

$ph

(XQ1)

for which the type system statically infers the followingoutput type,

(element sndname {xsd:string},

element phone {xsd:string})*

Function (XD3) corresponds to the query,

for $p in d/person,

$n in $p/name/sndname,

$ph in $p/phone

return <sndname> data($n) </sndname>,

$ph

(XQ3)

The type system infers the same output type of (XQ1).Result analysis states that both (XQ1) and (XQ3) are cor-rect as their output type matches (is a subtype of) the ex-pected type of the output. The function (XD2) becomesinstead,

for $p in d/persons,

$n in op:union($p/name/sndname,

$p/name/secondname),

$ph in $p/phone

return <sndname> data($n) </sndname>,

$ph

(XQ2)

The type inferred for this query is the empty type ().The system pinpoints the error, as the programmer wasexpecting a different type.

Essentially, XQuery provides programmers with power-ful result analysis tools, which are, in some situations, alsouseful for detecting errors before execution. In particular,the authors observe that, since queries are assigned anempty type if they cannot find a match in any instance ofthe input type, when the inferred type is empty the systemmay automatically report the query as incorrect. This isgenerally true, unless the programmer were expecting anempty type as output. This notion of incorrectness, how-ever, is rather incomplete, as many incorrect queries donot necessarily return an empty type. Consider for exam-ple the following query:

for $p in d/person,

$n in op:union($p/name/sndname,

$p/name/secondname),

$ph in $p/phone

return <sndname> data($n) </sndname>,

$ph,

$p/age

(XQ4)

Although the schema of d contains no age field, the typesystem infers exactly the same output type inferred for thequery (XQ1). The same happens with the following query,although the schema of d contains no seconname field.

for $p in d/person,

$n in op:union($p/name/sndname,

$p/name/seconname),

$ph in $p/phone

return <sndname> data($n) </sndname>,

$ph

(XQ5)

Suciu’s proposal Dan Suciu et al. focus on the devel-opment of a formal framework for the definition of resultanalysis tools [6]. They view queries as transformation

programs, i.e. applications transforming an original datasource into an XML database that conforms to a giventype.

They explore a backward type inference mechanism,which takes as inputs the query, the query input type,the expected output type, and checks that every databasethat is the result of the query applied to an instance ofthe input type, conforms to the output type. Again, themethodology fully addresses the issues of result analysis,but totally disregards a notion of query correctness.

3 µXQuery

µXQuery is an abstract version of the FLWR core ofXQuery. The main difference between µXQuery andXQuery is the lack of support for function definitions, andfor if − then− else and typeswitch expression. Moreover,µXQuery features only copy-semantics return clauses,hence discarding reference-semantics element construc-tion.

The novelty of µXQuery’s type system is that it hasbeen specifically designed for both result analysis andquery correctness checking. The type system infers theoutput type of a query, as in XQuery, but makes a distinc-tion between correct and incorrect queries, as in XDuce.In particular, it supports a three-levels definition of cor-rect queries, distinguishing between weakly correct andstrongly correct queries. We shall briefly discuss the ad-vantages of this approach.

3.1 Grammar

Queries are defined by the following grammar:

Q ::= ( ) | vB | 〈l〉Q 〈/l〉 | Q,Q | x | Q p

| let x := Q return Q | for x in Q return Q

Data are represented as ordered forests (f) of trees (t),as defined by the following grammar:

f ::= () | t | f, f t ::= vB | 〈l〉 f 〈/l〉

where vB is a leaf value of type B, ‘,’ is associative, and( ), f = f, ( ) = f .

Paths are defined by the following grammar:

p ::= nil | /ls | //ls | p p | p + p′

ls ::= l | ∗

3.2 Query semantics

A query Q is evaluated according to an environment ρwhich associates a forest with each free variable occur-ring in Q, and the result is denoted by [[Q]]ρ. Informally,[[Q]]ρ yields the pair 〈f, S〉, where f is the forest returnedby Q with respect to the substitution ρ, and S is a sta-tus variable that captures a notion of correct execution(formal details can be found in [2]). S ranges over theset {C, F}, respectively representing the correct or faulty

status of execution. Specifically, 〈f, C〉 states that f iscorrectly returned by a query, while 〈f, F〉 states that f isfaultily returned by a query.

In detail, a query Q correctly returns a forest f in anenvironment ρ ([[Q]]ρ = 〈f, C〉), if, for all path selectionsQ′ p in Q, the path p finds a match with the forest re-turned by Q′. In particular, for path selections of the formQ′ (p1+p2)p it is only required that either p1p or p2p findsa match in the forest returned by Q′. A query Q faultily

returns a forest f in an environment ρ ([[Q]]ρ = 〈f, F〉), ifthere exists a path selection Q′ p in Q for which either Q′

faultily returns a forest f ′ or p cannot find a match in f ′.Because of union types, two well-typed input values may

exist such that the same query may correctly return aresult on one and faultily return a result on the other one.For this reason, we adopt a three-levels classification of thesemantic correctness of a query with respect to an inputtype: strongly correct if it correctly returns a forest forany well-typed input, weakly correct if it correctly returnsa forest for some well-typed input, incorrect if it faultilyreturns a forest for any well-typed input (Definition 3.1).

Consider for example the databases d1 and d2 in Fig-ure 2, which conform to the schema in Figure 1. The query(XQ4) on d1 faultily returns the forest:

<sndname> Sartiani </sndname> <phone> 123456 </phone>

as the path selection ($p\age) does not match the data;for the same reason, (XQ4) execution would be faulty overd2 or any d that conforms to the same schema. The sameis true for the query (XQ2), which faultily returns theempty forest because the /persons path will never matchthe data. These queries are incorrect.

On the other side, a query d/person/name/secondname

would faultily return an empty forest when applied to d1,since it finds no match, but would correctly return a resultwhen applied to d2. Hence this query is weakly correct.

For path selections Q (p1 + p2)p, our notion of correctexecution is rather permissive, in the sense that, as wealready said, matching is required for at least one alterna-tive. The query (XQ1), for example, is correctly executed

d1 = <people><person>

<name>

<frsname> Carlo </frsname>

<sndname> Sartiani </sndname>

</name>

<phone> 123456 </phone>

</person></people>

d2 = <people><person>

<name>

<firstname> Dario </firstname>

<secondname> Colazzo </secondname>

</name>

<phone> 654321 </phone>

</person></people>

Figure 2: Two db’s conforming to the DTD in Figure 1

over d1, and returns the same forest f above with S = C,since one of its alternative paths finds a match in d. (XQ1)is actually strongly correct, since, for any well-typed con-tent of the database, at least one of its alternative pathsfinds a match. The query (XQ5), when applied to d1,returns the same forest as (XQ1), thanks to the disjunct/name/sndname, hence this execution is correct, despitethe presence of the wrong path /name/seconname. How-ever, (XQ5) is actually weakly correct since, over d2, itsexecution is faulty.

Hence, our semantics defines a three-level notion of thecorrectness of a query with respect to an input type. Now,our aim is to define a type system that is able to infer, foreach query, a reasonable approximation of its semanticcorrectness with respect to a given input type.

3.3 Type system

In this Section we introduce µXQuery’s type system. Wefirst introduce the syntax of types, then give an informalcharacterisation of the semantics of types, and give a char-acterisation of type correctness in terms of the semanticsof queries. Formal definitions, as well as type rules, canbe found in [2].

3.3.1 Type language

The type language we consider is essentially XDuce’s typelanguage, and is defined by the following grammar.

T ::= () | B | T, U | T + U | l[T ] | X

where B represents atomic types. The empty type ( ) onlycontains the empty forest ( ). The type constructor l[T ]represents the set of trees rooted as l and containing aforest of type T . Concatenation T, U represents the setof forests f, f ′, where f and f ′ are forests in T and Urespectively. The untagged union type constructor T + Urepresents the set of forests f which belong to either T orU .

Type variables are defined by an environment E, whichconsists of a set of potentially mutual recursive type defi-nitions of the following form:

E ::= ( ) empty environment

X = T, E type variable definition

x : T, E query variable declaration

Note that environments also contain query variable typedeclarations x : T . These are used in the typing rules givenin [2]. Moreover, observe that regular expressions types,such as repetition and optional types, can be defined bycombining recursive and union types as follows:

T∗ ≡ X with X = ( ) + (T, X) T ? ≡ ( ) + T

For instance, the DTD given in Figure 1 corresponds tothe following µXQuery’s type environment,

PEOPLE = people[PERSON +]

PERSON = person[NAME,PHONE]

NAME = name[(FRSNAME,SNDNAME) + (FIRSTNAME,SECONDNAME)]

FRSNAME = frsname[String]

SNDNAME = sndname[String]

FIRSTNAME = firstname[String]

SECONDNAME = secondname[String]

PHONE = phone[String]

While query correctness in XDuce and XQuery is basedon subtyping, in µXQuery it is based on a relation ofcoherence between query paths and query input types.2

3.3.2 Semantics of types

We interpret a type as the set of all forests that have thattype. In the style of [4], we define the semantics of typesby means of a set of deduction rules over judgements ofthe form E ` f : T , which state that f conforms to T withrespect to E. Informally, we write

[[T ]]E

= {f | E ` f : T}.

3.3.3 Query correctness

To define query correctness in µXQuery, we denote as[[Q]]E the set of all possible results, i.e. pairs 〈f, S〉, re-turned by Q, for each assignment to Q’s variables thatrespects the type definitions in E.

Definition 3.1 (Correctness) Given a query Q and an

environment E of type definitions for free variables in Q,

we say that Q is

strongly correct if ∀ 〈f, S〉 ∈ [[Q]]E . S = C

weakly correct if ∃ 〈f, S〉 ∈ [[Q]]E . S = C

incorrect if ∀ 〈f, S〉 ∈ [[Q]]E . S = F

In [2] we have defined a set of algorithmic rules whichreflect this characterisation, by returning the inferred cor-

rectness of a query (the relationship between the correct-ness as inferred by the type rules and the actual correct-ness is discussed below). Observe that query correctness

2As a consequence of this the system may infer context-free types

for some queries, even when they feature regular input types.

is strictly related with type inference, as in the presenceof nested queries, correctness of outer queries depends onthe type inferred for inner queries. As a consequence, therules also return the query type thereby enabling resultanalysis techniques.

Table 1 shows the inferred correctness returned by therules for path selections Q p, given the type inferred for Q(the symbol denotes any label) and the path p.

Path p Type of Q Strong Weak

1 /l + /m [l[T ] + m[U ]] Y es Y es

2 /l + /m + /n [l[T ] + m[U ]] Y es Y es

3 /l [l[T ] + m[U ]] No Y es

4 /l + /n [l[T ] + m[U ]] No Y es

5 /n [l[T ] + m[U ]] No No

6 /l + /o [l[T ] + m[U ]] + n[V ]] No Y es

Table 1: Inferred correctness.

As illustrated by the rows 1 and 2, Q p is strongly typecorrect if p finds a match in each instance of the type of Q.Accordingly, when checking correctness of a path p withan input union type, the rules state that Q p is stronglycorrect only if all members of the union type are matchedby p (data covering). However, we do not require here thepath covering property, i.e. that every disjunctive memberof p is matched (or may be matched) by a piece of data,hence row 2 is strongly correct as well. Data covering andpath covering are, in a sense, independent issues, both ofthem relevant, though we focus here on the first one only;we will come back to this in Section 4.

When not all members of the type, but at least one, arematched by p, the query is weakly correct, as exemplifiedin rows 3, 4, and 6. In this case, programmers may exploitthis information, deciding either to make their queriesstrongly correct, by adding the missing alternatives in thepath, or run them anyway when they are not interested inquerying the non-matching instances. Finally, row 5 ex-emplifies an incorrect query, whose path will never matchany data.

The rules, given a query Q and an environment E, infera pair (T ; A), where T is the output type of Q and A isa variable that ranges over the set {s, w, i}. The correctdefinition of the rules is far from trivial. Consider the fol-lowing query over a database of type root[ a[int]+ b[int] ],bound to the variable $y,

{ $y/a, $y/b }

The query is semantically incorrect, as $y either matches/a or /b, but cannot match both of them. However, astandard inductive type rule such as,

E ` Q : w, E ` Q′ : w ⇒ E ` Q,Q′ : w

does not suffice for the example above, as both $y/a and$y/b are weakly correct with respect to the type of $y.Similarly, but more subtly, the following query

{ $y/a/a, $y/a/b }

where $y is of type root[X ] with X = a[X ] + b[X ] + int,is semantically incorrect. Indeed, the product query$y/a/a, $y/a/b is incorrect as each level of a unary treeeither contains a label a or a label b.

In [2] we give a solution to these problems by means ofcomplex algorithmic type rules, which keep track of whichmembers of union types are matched by the paths in thequery, and return a correctness status which depends onthis information.

The following proposition claims soundness of the typerules with respect to the characterisation of query cor-rectness, although the rules are not complete (a stronglycorrect query may be flagged as w).

Proposition 3.2 (Soundness) For each query Q and

environment E, if the judgement E ` Q : (T ; A) holds,

then,

if A = s then Q is strongly correct;

if A = w then Q is weakly or strongly correct;

if A = i then Q is incorrect;

Observe that the type rules always infer an output type.By doing so, in the style of XQuery, the type system alsoprovides programmers with result analysis tools. For ex-ample, for queries (XQ1) to (XQ4), our type rules infer thesame output types as XQuery’s. However, we are also ableto identify (XQ1) as strongly correct, (XQ3) and (XQ5)as weakly correct, and (XQ2) and (XQ4) as incorrect.

4 Path covering

To simplify the discussion, imagine, for a moment, thatevery query is just a sum of paths, and every input typeis just a union type. Then, the system we described upto now is geared towards the prevention of problems thatarise (informally) because one branch of the input-dataunion-type is not covered by any path (lack of strong cor-rectness), or even no branch of the union type is coveredby any path (lack of weak correctness).

This is already complex enough, but only captures thoseerrors that show up as ‘too few paths in the query’. Errorsmay also show up as ‘too many paths’, as in rows 2, 4, and6, in Table 1, or in query (XQ5), where we have paths thatare not covered by any branch of the union type.

Typical programming errors, like path misspelling, tendto generate both path and data coverage problems, as in(XQ5) or in row 4, hence suggesting that path coveragemay be ignored. On the other side, consider row 6 inTable 1. Here the programmer misspelled an /m into an/o, was expecting a ‘weak correctness’ result, and getsit from the type-checker; hence, the misspelling problemdoes not show up in the type.

Moreover, the errors generated by a data-covering basedanalysis can only be reported in terms of type-branchesthat have not been considered by a subpart of the query,while a path-covering based analysis would pinpoint a

wrong path. Going back to line 4 if the table, the data-covering error is ‘branch m[U ] in the type is not covered’,while the path-covering one is ‘subpath /n is irrelevant’.The second message helps the programmer better.

For these reasons, path coverage should be consid-ered by a correctness-checking type system. How-ever, while we gave a precise semantic characteriza-tion of the correctness of a query with respect todata covering, this is not easy when path coveringis considered. As an example, the path expression(/a+/b+/c),(/a+/b+/c),(/a+/b+/c) should be equiva-lent to its expansion /a/a/a+/a/a/b+/a/a/c+/b/a/a....However, the first one should probably be consideredwrong only when one of the nine atoms /x is useless, whilethe second one is suspect as soon as any one of the twenty-seven addends is useless.

In [2] we discuss some concrete notions of path-coveringcorrectness and type rules; here we can only point out thatthe problem is relevant and difficult.

5 Conclusions and future issues

This work presents a type system in which both resultanalysis and query correctness analysis of XML queriescan be conveniently expressed. Specifically, queries can beclassified as strongly correct, weakly correct, or incorrect.We have seen that such classification describes an intuitivespectrum of query correctness characterisations.

Our type system, beside being the first to try correct-ness analysis for XQuery-like languages, also provides aframework in which different notions of correctness canbe formally identified and studied, as exemplified in thediscussion of Section 4. We are currently working on thedesign and comparison of such alternative notions.

Finally, we plan to augment the type language withother type operators, such as non-ordered sequences, IDand IDREFs types, so as to study our results in the con-text of a system closer to XML Schema.

References

[1] D. Chamberlin & J. Clark et al. XQuery 1.0: An XMLQuery Language. Technical report, W3C, 2001.

[2] D. Colazzo & G. Ghelli et al. Types for Cor-rectness of Queries over Semi-structured data.http://www.di.unipi.it/~colazzo/tcqssd.ps.

[3] P. Fankhauser & M. Fernandez et al. XQuery 1.0 formalsemantics. Technical report, W3C, 2001.

[4] H. Hosoya. Regular Expression Types for XML. PhD thesis,The University of Tokyo, Japan, 2000.

[5] D. Suciu. The XML Typechecking Problem. SIGMOD

Record Web Edition, Special Section on Data Management

Issues in Electronic Commerce, 2002.

[6] T. Milo & D. Suciu & V. Vianu. Typechecking forXML Transformers. In ACM Symposium on Principles of

Database Systems, 2000.

The Query Language TQL

Giovanni Conforti Giorgio Ghelli

Antonio Albano Dario Colazzo Paolo Manghi Carlo Sartiani

Dipartimento di Informatica, Universita di Pisa, Pisa, Italy

May 1, 2002

Abstract

This work presents the query language TQL, a query lan-guage for semistructured data, that can be used to queryXML files. TQL substitutes the standard path-basedpattern-matching mechanism with a logic-based mecha-nism, where the programmer specifies the properties of thepieces of data she is trying to extract. As a result, TQLqueries are more ‘declarative’, or less ‘operational’, thanqueries in comparable languages. This feature makes somequeries easier to express, and should allow the adoption ofbetter optimization techniques. Through a set of exam-ples, we show that the range of queries that can be declar-atively expressed in TQL is quite wide. The implementa-tion of TQL binding mechanism requires the adoption ofnon-standard techniques, and some of its aspects are stillopen. In this paper we implicitly report about the currentstatus of the implementation by writing all queries usingthe version of TQL that has been implemented, and thatcan be freely downloaded from //tql.di.unipi.it/tql.

1 Introduction

The Tree Query Language [1] (TQL) is a query languagefor tree-shaped semistructured data. The language isbased on the set comprehension (match-filter-construct)paradigm, in the tradition of SQL, StruQL, Lorel, Quilt,XQuery (among many others). However, the match-filteroperation is expressed in TQL using a variant of the am-bient logic [2], a logic defined to describe process structureand behaviour. TQL adopts a subset of that logic, for itsability to describe trees.

The TQL logics is used to express the binding (match-filter) part of a query. The same logic can be exploited todescribe those properties of the data that are usually ex-pressed through types and constraints. This implies that:

• TQL queries can be exploited in order to checkwhether a data source has a type, or satisfies a con-straint;

• whenever a type or a constraint C is known to hold fora data source, the binder B of TQL queries is equiv-alent to its refinement B ∧ C, which is a legitimate

The contact author is Giovanni Conforti [email protected]

TQL binder, as we will see. This refinement opensthe way for new optimizations, or even for the staticdeclaration of an empty result, if the unsatisfiabilityof B ∧ C can be detected.

We will exemplify later these two properties.The promise of combining the expression of types, con-

straints, and queries in just one language, and to use thissynergy for optimization and error-checking purposes, isthe kernel of the TQL project. But the language is alsoworth studying for its ability to express complex queries bydeclaring the properties of what one is looking for, insteadof describing a path to arrive there. While most interestingproperties, as we will show through examples, are heavilypath-based, others involve negation, implication, universalquantification, and are expressed in other languages, suchas XQuery, by resorting to external functions or by oper-ational means, which makes optimization and formal rea-soning on the queries quite harder. While many program-mers are perfectly comfortable with operational-orientedprogramming and reasoning, others find declarative ex-pression easier, and there is at least a pattern, exemplifiedin Section 5.3, where the TQL style clearly pays off. InTQL, whenever you are able to describe a property of aspecific tag (e.g., title is a key for each article), by substi-tuting the constant with a variable you obtain the querythat finds all tags with the same property (e.g., find allpairs x, y such that tag x is key for y).

This feature is reminiscent of prolog-like languages.However, TQL does not share datalog problems with nega-tion, partly because TQL is born with negation, andmostly because we restrict ourselves to a monotone formof recursion.

In the rest of the paper we present the expressive powerof TQL and some of the properties we discussed here,through a succession of examples, all tested on the cur-rent TQL implementation. In some cases we also presentan XQuery equivalent query, for the sake of comparison,and also to clarify our usage of the terms ‘declarative’ and‘operational’ expression of queries.

The current version of TQL data model is unordered.This makes TQL unusable in document-oriented applica-tions, but this lack of order is very important, in terms ofallowed optimization, in database-like applications. Deal-ing with order is left as a future extension.

2 Related work

Many query languages for semistructured data and XMLhave been designed in the past years: StruQL, Lorel, XQL,XML-QL, YATL, etc. Building on this research, W3C isdesigning XQuery [3], a standard query language for XMLdata, which subsumes many concepts coming from theselanguages. XQuery (still a work in progress) is a typed,Turing-complete query language that can be used in bothXML-enabled database systems and native XML systems.

While TQL and XQuery are based on the same bind-filter-reconstruct paradigm, they differ in many aspects.

First of all, TQL, by design, is based on a logic that canexpress types, constraints, and queries, and is tailored forformal, and automated, manipulation. On the other side,XQuery is designed as an industrial-strength language,aimed at both database-oriented and document-orientedapplications. As a consequence, TQL has a very sharpsemantic definition, that can be completely defined in onepage of formulae, while XQuery semantics is much morecomplex. On the other side, XQuery data model supportsorder and oid-like information, which are not dealt within the current TQL version.

Second, even though XQuery expressive power is greaterthan TQL’s (the former is Turing-complete), some queriescan be more easily expressed in TQL, thanks to the greaterexpressive power of the tree-logic with respect to a purematching mechanism.

Finally, XQuery features powerful vertical navigationalfacilities, while it lacks corresponding horizontal opera-tors; TQL, instead, makes no difference between horizon-tal and vertical navigation, hence allowing the user to eas-ily impose horizontal constraints on documents.

3 The Simplest Queries

3.1 The Input Data

We begin with some standard queries, borrowedfrom the W3C XMP Use Case [4]. These queriesoperate over the XML document available at//tql.di.unipi.it/bib.xml, which we assume tobe bound to the variable $Bib in the global environment(the TQL system allows any document on the webto be bound to a variable). The document containsbibliography entries, whose structure is described by thefollowing DTD:

<!ELEMENT bib (book* )>

<!ELEMENT book (title, (author+ | editor+ ),

publisher, price )>

<!ATTLIST book year CDATA #REQUIRED >

<!ELEMENT author (last, first )>

<!ELEMENT editor (last, first, affiliation )>

<!ELEMENT title (#PCDATA )>

<!ELEMENT last (#PCDATA )>

<!ELEMENT first (#PCDATA )>

<!ELEMENT affiliation (#PCDATA )>

<!ELEMENT publisher (#PCDATA )>

<!ELEMENT price (#PCDATA )>

The DTD specifies that a book element contains atitle, one or more author elements or one or moreeditor elements, one publisher element and one priceelement; it also has a year attribute. An author con-tains a last and a first name elements. An editor ele-ment also contains an affiliation. Finally, title, last,first, publisher, and price elements contain string val-ues.

In this paper we present the XML file using its morecompact TQL-syntax representation, which looks as fol-lows (the implemented system allows both XML and TQLvisual presentations):

bib[

book[year[1992]

| title[FoundationsDatabases]

| author[ first[Serge] | last[Abiteboul] ]

| author[ first[Richard] | last[Hull] ]

| author[ first[Victor] | last[Vianu] ]

| publisher[Addison]

| price[60]

]

| book[year[1990]

| title[SistemiOperativi]

| author[ first[Piero] | last[Maestrini] ]

| publisher[McGrawHill]

| price[38]

]

...

In this format, bib[C] stands for an element taggedbib whose content is C, while C1 | C2 is the concatena-tion of two elements, or, more generally, of two sets ofelements. We use this non-XML notation because TQL isborn as a language to query semistructured data in gen-eral, i.e. unordered forests with labeled nodes, and not justXML. XML is just one way to construct such forests, usingtagged elements (and attributes) to build labeled nodes.

3.2 The Formal Presentation of TQLData Model

More formally, TQL data model is defined by the fol-lowing syntax and equations. The syntax specifies thata forest is either a leaf, or an empty forest, or a nodelabeled by tag and leading to a subforest, or the unionof many forests. We choose to distinguish between aleaf ’tag and a leaf tag [0] in order to model the XMLdistinction between PCDATA and empty elements.

TQL forestsforest ::= ’tag | 0 | tag[forest] | forest | forest

The formal definition of the data model is completedby the equations that specify that | is commutative and

TQL node-labeled forests can be equivalently described as edge-labeled trees, as done in [1].

associative, and 0 is its neutral element:

t | t′ = t′ | t t | (t′ | t′′) = (t | t′) | t′′ t | 0 = t

Hereafter we will elide the leaf constructor ’, writingt[d] instead of t[’d], unless ambiguity arises; the sameabbreviation is supported in the implemented system.

3.3 Matching and Binding

The basic TQL query is from Q |= A select Q’, whereQ is the subject (or data source) to be matched against theformula A, and Q’ is the result expression. The matching ofQ and A returns a set of bindings for the variables that arefree in A. Q’ is evaluated once for each of these bindings,and the concatenation of the results of all these evaluationsis the query result.

For example, consider the following TQL query, thatreturns the titles of all books written in 1991, and is eval-uated in an environment where $Bib is bound as specifiedabove.

from $Bib |= .bib[.book[.year[1991]

And .title[$t]

]

]

select title[$t]

The formula:

.bib[ .book[ .year[1991] And .title[$t] ] ]

is an ambient logic formula, which should be read as:“there is a path .bib[.book[ ]] that reaches a placethat matches .year[1991] And .title[$t], i.e. a placewhere you find both a path .year[] leading to 1991 anda path .title[ ] leading to something, that you will call$t”.

The formula .tag[A], read “there exists an element tagwhose content satisfies A”, is the most useful operator, butis actually defined in terms of three more basic operators:truth T, vertical splitting A’ | A’’, and element matchingtag[A].

The element formula tag[A] only matches a one-element document: while .t[A] matches both forests t[D]and t[D] | t2[D2] | ... (provided that A matches D),the formula t[A] only matches the first one. The truthformula T matches every forest. Finally, the formula A1

| A2 matches D iff D is equal, modulo reordering, to D1 |D2, with Ai matching Di. For example, the following pairsmatch, provided that $a is bound to Date:

title[IDB]|author[Date]| author[$a]|title[IDB]|

year[1994] year[1994]

title[IDB]|year[1994] T

title[IDB]|author[Date]| author[$a]|T

year[1994]

author[Date] author[$a]|T

The third formula can be read as: there is an author $aand something else, hence is equivalent to .author[$a];the fourth pair matches as well, since the empty forestmatches T. Hence, m[A] | T is equivalent to .m[A]; this isactually the official definition of the semantics of .m[A].

While in this example we matched $t with a leaf, a TQLvariable can be matched against any forest, or against atag.

For example, the following query returns any tag insidea book whose content matches .first[Serge]; .a.b[A]abbreviates .a[.b[A]].

from $Bib |= .bib.book.$tag.first[Serge]

select SergeTag[$tag]

Finally, the following query matches the formulayear[1992] | $EveryThingElse against any book,hence it returns, for any book whose year is 1992,everything but the year:

from $Bib |= .bib.book[year[1992]

| $EveryThingElse

]

select BookOf1992[$EveryThingElse]

Since we have two books of 1992, there are two possiblebindings for $EveryThingElse, each corresponding to thewhole content of a 1992 book without its year subtree;hence the result is:

BookOf1992[

title[FoundationsDatabases]

| author[ first[Serge] | last[Abiteboul] ]

...

]

| BookOf1992[

| title[Interpreters]

| author[ first[Vincent] | last[Aho] ]

...

]

Hereafter, as a convention, we use lowercase initials forvariables that are bound to tags and uppercase initials forvariables that are bound to forests.

3.4 Matching and Logic

TQL logic allows the programmer to combine matchingand logical operators. For example, the condition in thefollowing query combines the request for the existence ofa title field, of a $x field containing Springer, and ofeither an author.last or an editor.last path leadingto Buneman.

from $Bib |=

.bib.book [.title[$t]

And Exists $x. .$x[Springer]

And (.author.last[Buneman] Or

.editor.last[Buneman])

]

select title[$t]

The pattern Exists $x. .$x[A] is common enough todeserve the abbreviation .%[A], that we will use hereafter(see [1] for the exact definition of this abbreviation).

Conjunction, disjunction, and universal quantificationare operators that can be found in many match-based lan-guages. TQL, however, has the full power of first-orderlogic, hence we can express universal quantification andnegation of arbitrary formulas. This will be exemplifiedlater.

4 Restructuring the Data Source

In TQL syntax, a subquery can appear wherever a forestexpression is expected, as expressed by the followingsyntax:

TQL QueriesQ ::= from Q|=A select Q | ’tag | 0 | tag[Q] | Q |Q

This freedom of nesting is a feature of most modernquery languages, and is typically exploited to use the nest-ing structure of the query in order to describe the nestingstructure of the result. For example, in our data sourcethere is an entry for each book, containing the list of itsauthor. We can restructure it to obtain an entry for eachauthor, containing the list of its books. The structure ofthe result can be visualized as follows, where (A)* indi-cates an arbitrary repetition of the A structure:

(author[ authorname[...] | (book[...])* ])*

Observe how this structure is reflected by the structure ofthe following query, with a from-select for each *.

from $Bib |= .bib.book.author[$A]

select author[authorname[$A]

| from $Bib |= .bib.book[author[$A]

| $OtherFields

]

select book[$OtherFields]

]

This query performs a nested loop. For each bind-ing of $A to a different author, it returns a for-est result[author[$A]] | book[...]|...|book[...]],where book[...]|...|book[...] is the result of the in-ner query, i.e. it contains one book element for each bookwhose author is $A. As in a previous example, we extract,from the input book, all the fields but the author.

5 Schema-less XML data

As XML documents are not necessarily to come witha DTD, query languages should provide mechanisms forquerying data regardless of the structure.

Alternatively, when schema information is fundamentalfor writing sensible queries, schema inference mechanismsare very useful. For example, one may be interested in

finding the exact structure of the data, or in finding themandatory elements in the data. Property checking toolsmay also be useful, so as to prove the validity of given as-sertions about the data. For instance, checking whether acertain set of tags is a primary key, or if a tag is mandatoryin a specified path.

TQL provides all these mechanisms by simply combin-ing tag variables (as in [6] and [7]) and ambient logic, asshown in the following sections.

5.1 Querying in absence of schema

We consider an XML document, bound to $Bib2 in theglobal environment, which is similar to the $Bib file, butfeatures some extra-elements with a title (i.e. article, phd,etc.), whose labels are not known a priori.

The following query selects the title of all elements,whatever the label, and wherever they are, that containan element whose value is Suciu; the * operator iteratesa path an arbitrary number of times (may be zero); .%*must be read as (.%)* and corresponds, roughly, to theXPath operator //.

bib[from $Bib2 |= .%*.$B[ $A[Suciu] | $Rest ]

select $B[ Suciu[$A] | $Rest ]

]

This query constructs a Suciu’s personal bibliography doc-ument, selecting all elements in $Bib2 where he appearsand inverting the tag with the content. The remaininginformation present in the elements involving Suciu areinserted in the result using the $Rest variable.

This query clearly reveals some of the differences be-tween TQL and XQuery, in which it would be expressedas follows,

<bib>

for $b in $Bib2//*,

let $xx := $b/*,

for $y in $xx

where $y/data() = "Suciu"

return <xf:name($b)>

<Suciu>

xf:local-name($y)

</Suciu>,

{ op:except($xx,$y)}

</xf:name($b)>

<bib>

Observe how TQL’s binding mechanism and horizontalnavigation are more declarative than XQuery’s, whichadopts instead operational techniques:

• the definition of each binding to a variable requires acorresponding nested loop (for or let), while in TQLall free variables are bound in one single from-selectclause;

• horizontal constraints are dealt with an external op-erator op:except, while in TQL these are expressedwith the logic horizontal navigation operator |.

5.2 Checking Properties

In this section we show how tree logic formulae can beused to express properties of XML data. When a formulaA expresses a property, we can check it by running thequery from Q |= A select success: this query returnsthe leaf success if A holds over Q, and an empty forestotherwise.

As a first example we consider a query that verifies ifthe tag title is mandatory for book elements in the $Bibdocument.

from $Bib |= bib[Not .book[Not .title[T]]]

select title_is_mandatory

The formula Not .book[Not .title[T]] means: it isnot the case that there exists a book whose content doesnot contain any title, i.e. each book contains a title.TQL actually features an operator !a[A] defined as Not.a[Not A] which we can directly use, as in the follow-ing query. Here !book.title[T] is an abbreviation for!book[.title[T]], hence means: for every book there isa title.

from $Bib |= bib[ !book.title[T] ]

select title_is_mandatory

The formula !a[A] is dual to .a[A] in the same senseas ∀x.A is dual to ∃x.A, or ∧ is dual to ∨. In TQL, everyprimitive operator has a derived dual; this implies thatnegation can always be pushed inside any operator, henceyou can write any query with no use of negation. Actu-ally, when negation appears in a query, in most cases theTQL optimizer pushes it down to the query leaves (vari-ables, expression of the content of a leaf, comparisons),since negation is quite expensive. This is the reason why,although we claim that unlimited negation is an impor-tant feature of TQL, you will see very little explicit use ofnegation in our examples.

The next query verifies that title never appears twicein a field.

from $Bib |= Not bib[.book[ .title[T] | .title[T] ] ]

select title_never_appears_twice

Another interesting property to verify is whether a giventag is a primary key. There are many possible generaliza-tions of the relational notion of key to the semistructuredcase. The statement below, for example, says that titleis a mandatory field, and that you cannot find two sepa-rate books with the same title (more precisely, with onetitle in common).

from $Bib |=

bib[!book[.title[T]]

And foreach $X. Not (.book.title[$X] |

.book.title[$X])

]

select each_title_is_key

Of course, if the system knows that $Bib satisfiesbib[!book[.title[T]]], this knowledge implies thatbib[!book[.title[T]] And foreach $X. Not(.book.title[$X] | .book.title[$X]) ]

is equivalent (over $Bib) tobib[foreach $X. Not (.book.title[$X] |.book.title[$X])].

We do not comment further on this point, since thiskind of optimization is out of the reach of the currentimplementation of TQL.

Our last query checks that the $Bib element containsonly elements labeled book, by asking that each tag insidethe outer bib is equal to book.

from $Bib |= bib[foreach $x .$x[T] implies $x=book]

select only_book_inside_bib

This query can be rewritten using path operators asfollows:

from $Bib |= bib[Not (.Not book[T])]

select only_book_inside_bib

Here Not book is a tag-expression that stands for anytag different from book. Hence, .Not book[T] means:there exists a subelement whose tag is different frombook. Hence, Not (.Not book[T]) means: there existsno subelement whose tag is different from book.

5.3 Extracting the Tags That Satisfy aProperty

Every query Q in the previous subsection checks a prop-erty P of a tag t. In all such cases, if we substitute, in Q,t with a tag variable, we obtain a query that finds the setof all tags that satisfy P .

For example, we can extract all keys of books by tak-ing the query that checks whether title is a key, andsubstituting title with $k, as follows:

from $Bib |=

bib[!book[.$k[T]]

And foreach $X. Not (.book.$k[$X] |

.book.$k[$X])

]

select key[$k]

It must be highlighted that this is possible because inTQL we can universally quantify even on a formula withother free variables ($k, in this case). The query evalua-tion algorithm we exploit to this aim is quite sophisticated,and is described in [5].

Generalisation by simple substitution is not possible inXQuery, where variables inserted to replace tags must atleast be bound by an outer for clause, thus requiring theredesign of the original query.

A similar generalisation can be performed for the queriesthat check whether a label is mandatory, or occurs only

once, inside another one. We present below a query thatalmost produces a DTD for any input XML file (moduloordering) by extracting all the tags in the file and listing,for each of them, all the labels that must or may appear,and distinguishing among them the ones which may berepeated and the ones which only appear once. While itmay look frightening, it has just been obtained by a trivialgeneralization of the simple queries we presented above.

from $parts |= .%*.$tag[.%[T]]

select $tag[ mandatory_subtags

[from $parts |=

Not (.%*.$tag[Not .$subtag[T]])

select $subtag[]

]

| optional_subtags

[from $parts |=

.%*.$tag[ .$subtag[T]]

And .%*.$tag[not .$subtag[T]]

select $subtag[]

]

| list_subtags

[from $parts |=

.%*.$tag[ .$subtag[T] |

.$subtag[T]

]

select $subtag[]

]

| non_list_subtags

[from $parts |=

.%*.$tag[ .$subtag[T]]

And not .%*.$tag[ .$subtag[T] |

.$subtag[T]

]

select $subtag[]

]

]

6 Recursion

TQL logic also includes two monotonic recursion operators(rec and maxrec), very similar to the µ and ν operators(minimal and maximal fix point) of modal logic. These canbe used to interpret the Kleene star operator path* andto express recursively definable forest properties. Considerfor example the following formula:

rec $Binary. (%[$Binary | $Binary]) or %[0] or ’%

The formula describes a binary tree, defined as either anode leading to two binary trees, or as a leaf; %[0] Or’% matches a leaf, which may be (in XML terminology)either an empty element, or a piece of PCDATA.

7 Conclusions

Although the language TQL originates from the study of alogic for mobile ambients, for the simplest queries it turnsout to be quite similar, in practice, to other XML querylanguages.

However, the expression of queries which involve recur-sion, negation, or universal quantification, has in TQL aclear declarative nature, while other languages are forcedto adopt a more operational approach.

All queries presented in this paper are executable inthe prototype version of the TQL evaluator, and can befound in the file demo.tql in the standard distribution.The current version of the prototype works by loading alldata in main memory, but is already based on a transla-tion into an intermediate TQL Algebra [5], with logicaloptimizations carried on both at the source and at the al-gebraic level. The intermediate algebra works on infinitetables of forests, represented in a finite way, and supportssuch operations as complement, to deal with negation, co-projection, to deal with universal quantification, severalkinds of iterators, to implement the | operator, and a re-cursion operator.

TQL is currently based on a unordered nested multi-sets data model. The extension of TQL’s data model withordering is an important open issue.

References

[1] L. Cardelli and G. Ghelli. A Query Language Based onthe Ambient Logic. In Proc. of European Symposiumon Programming (ESOP), Genova, Italy, 2001.

[2] L. Cardelli and A. D. Gordon. Anytime, anywhere:Modal logics for mobile ambients. In Proc. of Princi-ples of Programming Languages (POPL). ACM Press,January 2000.

[3] Don Chamberlin, James Clark, Daniela Florescu,Jonathan Robie, Jerome Simeon, and Mugur Ste-fanescu. XQuery 1.0: An XML Query Language. Tech-nical report, World Wide Web Consortium, jun 2001.W3C Working Draft.

[4] Don Chamberlin, Peter Fankhauser, Massimo Mar-chiori, and Jonathan Robie. XML Query Use Cases.Technical report, World Wide Web Consortium, De-cember 2001. W3C Working Draft.

[5] G. Conforti, O. Ferrara, and G. Ghelli. TQL Alge-bra and its Implementation. To appear in Proc. ofIFIP International Conference on Theoretical Com-puter Science (IFIP TCS), Montreal, Canada, August2002. Available at http://tql.di.unipi.it/tql.

[6] Alin Deutsch, Mary Fernandez, Daniela Florescu, AlonLevy, and Dan Suciu. XML-QL: A Query Languagefor XML. Technical report, World Wide Web Consor-tium, August 1998. Submission to the World WideWeb Consortium.

[7] M. Fernandez, D. Florescu, A. Levy, and D. Suciu. Aquery language and processor for a web-site manage-ment system. In Proc. of Workshop on Managementof Semistructured Data, Tucson, 1997.

�������� ����� � ��� ���������

������� �������

� ���� ������� ��� ���������� ����

�������� � ����� ���� ��� ����

�������������������

�������� �� � �� �

� ���� ������� ��� ���������� ����

�������� � ����� ���� ��� ����

��� � ������������

��������

�� ������� � ��� ��� ���� �� �� ����� ������� � ��� ���������� ��� ������ ��� ���� ����������� ����� � �������� � ��� ��� ����� ���� �� !"#$% ����� ��� ��& ��������� ���������% ������� ����� ����������% ������ ��� �����������& ��� �� ������ '������ ��������� �� (�)�*� ��� �������� �� ����� ��������� ��� �+�������� ����� ������ �� �� !"#$ �& ����������� ��� �, ���� � ��� ��� ���� � ���� �� � ��� ������� �� ��� ����� �� ����� ���� � ��� ���� ���� ���� ��������� ���� �&�� � ��� ����� ��� ����� ���� -�)�.% ����� ��������� ��� ���� ����� �� ����)��

�/����� ������� � �� � ��� ��������� ��� ���������� ��� '���& ��'����� �� � ��� ������ ����� �� ��� �)� �� ��������� ��� �� '������ ��� �������� �& ������� � � ����� ��� ��,������������ � �� ������ '������ � ��� ��� ��������� 0�����& �� ������� � ������� ��� ������� �� ��� �)��

� ����������

� ���������� ���� �� ��� �� ������� �� ��� ����� �� �� ���� ������ �� ����������� ������������� �������� �� !� ""��� �#��� $��%%&' (���� ���� ����� �� ������� ������' �� ��� �� ���� ��� ���������� ��� � ����� �) *+� ������ �������� �� ���� ��� ���� �� ��,��� ������ ������ ������ � ��� � ������� �� ��� ���� �� ��� *+� -�����' .���,�� ��� �� ���� ������ �������,������ ����� �� �����/� ������ �� $0� -����� �������� �� !&� ����� �� ��� ����� ��� �,����������� �� ��� *+� -�����' *#��� �� �)��� ��� ��������� �� ���� ������ �� *+� ������ �� ������ �� *0��) �12# !&� ����� �� ������ �� ������ ��� ������� ����������'

3� ������� *#��� ������ �� *+� ���� �� ������� *0��) -����� ��� �� ��� ������ �������45�1.656�56(75� �8��������' 6�� -��) �������� �� � �8��� 9�45�1.656 ������: �� �������� �� 956(75� �����:' (�� �8��� ��� ����� �� ����� �) *#���� �� �-��,���� �� ������������ ����� ����������� � ��/��� �� ��$; !&� ����� ������� ��/8 �� ��� ����� *+� ���' (�� ������ ����<�� � ������� � ������ ��� ������ ����� �� ���� �� ��� ��� �� ��� ���� ����� ��� ���������� ��$; !& �� �� � ���� ����� ���) �� ��� ����� *+� ���' (�� ����������� �� ��� ����� �� �� ���� ����� ��� �=�� �� >�8������) �� ������ ��� ������������ �� �����-���� -����� ��� ����������� �,������� ��8�� �������& � ���)�� �� � �� ����<� �� ���� ����'

(�� +3( �=�� ���������� �,����� �,� ��� ���� ����� ���� �'�'� �� ������ �������� ��������� ��<�' ����� ��� ��� +3( ����� ��� ������� �������� �������� �� ��$; !&� �'�'� ���� ���� ��������� ������ ��� ����������) �� � ���� -����� �������� � �� ?(3+6'

(��� ��� �� ��� ��������� ������������@

� � ��,�� ��������� �� �������� �� ���� ��� �� ���� ������ �� ���������� ����� �) *+�������'

� � ���/����� �� ��� ���� ����� ��� ��$; !& �� �������� �� �� ���� ��� ��� ���� �� �� ����<��� ���/�� ���� ����� ���' (�� �� ����� �� ��� ���� ����� ��� ��$; !& �� ��� �8��������������� �� ��� ��<�' 1� ���,��� ���� ����� �) ���/����� ��� ����� ��� ��� ���������� ����� -����� ���� ��������/�� ����� �� ��� � �������� �� $������ 2'

!

catalog

product

name color catseller

*

?

+name price rating

catalog

product

name colorcat=

"cars"

seller

name

$7.5K<price<$8K

rating>3

catalog

product

namecolor

cat="cars"seller

*

?

+

name$7.5K<price<$8K

rating>3

9: $��� ��� 9�: 0��) �! 9�: ���

����� !@ $��� �� 0!

catalog

product

name="Honda"

color="red"

cat="cars"

seller

name

price=$7.8K

rating=4

catalog

product

namecolor

cat="cars"seller

*

?

+

nameprice<$8K

rating>3

$7K

sellerspecialized

type

9: �� 9�: � � ����

����� A@ +3( ��� �!

� �� ������ �� �������� ��� ����� �� ?(3+6 ��� � ���� -������ ����� � ������� ��� ������,� �) �� ���) �� ��� ����'

� (�� +3( �� ����� �� �� � �)� ����� ������ ��� ��<�' � ����� ��� ������ �� �������� ���� �,�� ��� ���� ������) ���� ����� �� �� �� ��� ����������� ���������� �� ��� +3(� ���� ������ ����'

(�� ��� �� �������� � �������' 3� $������ A �� ������ ��� � ���� �� ��� ��������� ��*#���' $������ 2 �������� ��� ������ �� ������� +3( ���� -��) �,�� �� ��� �)��� ' $������ B�8����� ��� *#��� ����� ��� � ���� �� ��� �/�� ��� -����� �� -��)' �� �,�,��� �� �������� ��� ������ �� �������� �� $������ C' �����) �� $������ � �� �������� �� ������ ����� ������������'

� �� ������ �� ���������

(�� ��������� �� *#��� �� ����� �� ����� 2' *#��� ������ �� ��� �� � *+� ����� ��,���� �8���� ,��� ' �� ��/��� � ������ ��� ��$; !&� ����� ������� �� �������� ���� ���� ��� �' (�� ������ �� ��� ������� �� �� ��� ����� �� ���� �� �� ����������� ������� �������� �������� ����' 6�� ���� �� �� ��� �� �� �,� ���-�� ��' (�� ���� ��� �� ����� !9:� ��� ���� ������ � ������ �8 ���� ����� ��� ���� �� � ������ ������ ���� ��� ����� ������� ���������� �,� � �� � ������� ����� �����) ���) ������ ��� �� ���� �� ������' 6�� ����� �� � �� ���� ��D��� �� ������� ��� ������ ��� �� ���� ������� �� ��� ����� ��E �������'

*#��� ��� � � ��������� �� ��� ��������� ��,� �� ��� ��,� ������ �� ��� �� ��� ���������� ��,�' 6�� ��� ��� �������� �� ��� �� *0��) �12# !& -����� ��� � � � � �� ��� �� ��� ������ �)������ �45�1.656�56(75� �8��������' 6�� -��) � �������� �� � �8��� �� �� 9�45�1.656������:� ����� �� ��,�� �) ��� *+� ������ �� �������� �� �� 956(75� �����:� ����� �� �8������� ��� ����� �������' 1��� �� �� �8������ ����� ��� ����� �� �� � �� ��������� ����� �-��,���� �� �������� �� ��� ������ -��) �� ��� ��� ��9��9 :: F �9 :' �� �� ������������ ����� 9���-��): ��$; !&'� ���-��) ������ ��� ����� ��� ���� �� ����� ����� ������ �� ��� ���� �) ����� ����� ���������/�� ��� ��� � �� �� �������) ��������� ���������� �� �� ,����' (�� ���� �� ���-��) ��

A

��/8 �� ' �� �8 ��� ��� -��) �� ����� !9�: ����� ��� ������� ���� �����) G��H� ����� �,� ������ ��� ����� ��� ������ ������� IJ�C� �� I%� �� �� ���� ���� ��� 2'

*#��� �� ���������� �� ��� ������ �8������� ��

XMLDatabase

Server

XCacher

Browser

Application Server

qE

HTTP Request

RequestInterceptor

q

QueryDecomposer

qC

ReplacementController

QueryRewriter

qR

Web Server

ApplicationRequest

HTTP Response

ApplicationResponse

XHTML

QueryComposer

qF

XML

XHTML

Schema IncompleteTree

����� 2@ $)��� ����������

��� �8��� -����� �� ' (�� ������ �� *#��� ������� ����� 2 � ��� ���������@ (�� ����� ������������� � ����� � *0��) �8������� � �� ������� ��

�� �� ' (�� ����� ������� ����� ���� � ����� ����8��� -��) �� �� ���� ���� �� �� �� �� ������������) �� ��� �� ����� �� ��� ����� �'�'� �������� �������� ��� *+� �����' 3� �� ��� ��� -��)����� ������� �/�� ��� -��) �� � ����� �� ���� � � �� �� �� �8������ �� ��� �� ����� �� ����'4�������� �/�� ��� -��) �� �� ��� �� �� �� ���� -����� �� � ����� �� ���� �� ��� -��)�� ���� �� ��� *+� ����� �������,��)� � ��������� �� $������ B'

(�� ����� �� ��� � ���� -����� � ����� ����� -��) �� ���� �� �� ��� ����� ��� �������������� �� ��� ���) ����� ��� �� G����H ��,����� ����� ����' 3� ��� ��� ����� �� *+� �� ���� ��� /� ����� ����� ��� ����� ��� �������� �������� ���)������ ����� ��� ������ �������� �� $������ C� �����*+� �� �� ���9�: �� � �,� �� ��� ����'

(�� -��) �� ���� G ����H ��� ������ �� ��� �� ���� -����� �� �� � �8������ �� �� ��� ������ ������� ��� ������ �� ��� ��� ��,�� � �������� ��

$������ B'(�� ���� �� ����<�� � ������� � ������ ��� ����� �� ����� �������� �� ��� ��� �� ����� ��

��/8 �� �� � ���� �� ��� ��� � ����� �������� ��� ���� ���� ����� �� ��' 3� ������� ���������� ��� �)�� � �� � ������������ �� ��� ���� ��� �� �'�'� ����� �� ���������� �� ��� ���� ����� �� �� ���� �� � ���� ����� 9����� � ����� �������� �����: � ������ �� ������ ���K�������������� ���� ����������' (�� ��� ��� ���� ���� ����� �� �� �� ���� � �� � �� �������� �) ��� ������� �� �������� �� � �� � ' (�� ���������� ������ �� ��� ���� ����� �� � � �-��,���� �� ����� ����� ���-�����' ����� A ����� 9: ��� �� ��� �� �� 9�: ��� ���������� ��� �)�� � �� ��� +3( ����8������� �!' (�� �������� �� ��� ���������� ��� �)�� �� ���� �� �) ��� �������� � ������ ������) ���)��� ��� ��������� ������ �������� �� $������ 2' (�� ��<� �� � �� ����� �) ��� ,����� �� � �) �� ��,��� ��� ����� �� ��'

(�� +3( ��=�� �� ��� ���� ����� ��� ��/��� �� ��$; !& �� ��� ��������� �)�@ ����� ��� ������������� �)�� �� +3( �������� ��� ���� ���� �������� �� ��� �� ������ �� ��� ���� ����� ���� ���� ������������� ��� �)�� �������� ��� ������ ���� ����' (��� ����� �� �� ������ ��� � ����� ������������ ����� �� ����� 2� ����� ��� ���� ���� �� ��� �� ������) ����� �� ��� +3( �� ������ ����� �����' $������ ��� ���������� ����� �� � ��� ��� � �� � � �'�'� ��� ����������� �� ��� �� �� �)��� �������� �� ��� � ���K����' (�� ��� ���� ����� �� � ��� ��� �� �� ����� ��� �� ��� ��=�����������<�� �)���' (��� �����) �� ���/�� ��� ������ �� �������� ��� � ���� -����� � �� �� ����������' (���� ���� �� �� ����� ������� ��� ��� ���� �� ��� �� ��� �� +3( �� ��� �������<���)��� �� ��� ���������� ��� �)��� ������ ��� �������<�� �)��� � ���K���� �� ��� �� ��� �� ��������� G-��)���H �� ����� �� ��� �����)��� -��) �������' 3� ������� ��� ���� ����� ��� ������� ���-�� ���� �9�: �� ��� �������<�� �)�� �� �� �� ��� ���� �� �� ��� ����� �� � �,� ��� � � �����9�:' .���� ��� �,���� � ����� �� ����� ���� ����� �� ������ �)��� ���� ��-���� ����� ������ ,�����' �����) ��� �������<�� �)�� � �� � �� ������� ���� �������� ��� �� ��� � ��� �� ��� ������� -��) �� �,������� ���� �� �,�� �� ��� �)��� ' (�� �� ��� �� ����� � ���� �) ��� ����� ��������� � �������� �� $������ C'

���� ��������� � ��� ��� ������ ����� �� ������� �

2

catalog

product

name color cat="cars"seller

name

$7.5K<price<$8.8K

rating>4

catalog

product

name color cat="cars"seller

*

?

+

name

price<$9K

rating>4

$7K

catalog

product

name color cat="cars"seller

*

?

+

name rating>3

seller

+

rating>4nameprice<$9K$8Kprice<$8K$7K

9:0��) �A 9�: ���� 9�: 5�/��� � �

����� B@ 5�/�� ������ �� �A

(�� +3( �� �� ��� �� ���� �� ����� ��� ��� ���� ����� ���� ��� ��� ������� �� ���8��) �������� ��� ��� ����� ����' 3� ������� ��� �/�� ������ �� ��� � ���� -�����E �������� �� +3(� � �� ������ ������ ���) �� ��� ����� ��� �� ��� �� �� ��� ���) ��� ���������� ��� �)��'����� �� ��� ��<� �� ��� +3( �� ������� � �� �� ����� �����' ����� �� ���) �� ��� ���� ����� ������� ��������� ����� � � �� ?(3+6@ ���� ���� ��������� ������ ��� ����������) �� � ����-�����E ��������' � ����� �� ��� +3( �� ��� �� ���� ��� ���� ��� � ������� �� ���-����� ���� �������� ��� ���� ��� �� ��� �� ,��� ���� �8�������' (�� ���� ��� �,���� ��� �� � ����� �������� ��� ����� ����� -����� � �8������ ����� ��� �� ��� �� � ��� �� ���� �� �� � �) �� �������� ��� *+� ����� ��,� ��� �8���� ��� ,��� '

� ����� ���

(�� ��������� ������ ���� � ����� ��� ���� ��� �� ��� ���������� ��� �)�� � �� ��� +3( � �� ���-��) � �� ������� ��� �� ���������� ��� �)�� � � ��� �������� � �� ������� �� ��� -����� ���)�������� �� � '

��������� ������� �� ���� �����' (�� /�� ���� ��������� ��� ���������� ��� �)�� � �� �� �)���)��� ��� ���������� �� � �� ��� ����� ���� ��� �' ����� !9�: ����� ���' 3� ��� ������ ����� ������ ���������� �� � � �� ��<�� ������� �� ��� ����� ������ �� ����� � �������� �� ��� ���,��� ��� �� ��� ���� ,���� ���� ����� �� ���' �� �8 ���� �� ���� ��� ��� ��������� �� ��� ���� �������� �� 9IJ�C�� I%�: �� �IJ�� I%�:� ������ ��� �� �� ������� �� ���� �� �� �� ���� �� !����'

(�� �� ��<�� ���������� ��� �)�� �� �� � �� ��/��� � ���������� ��� �)��� ���� ��� ��������������/�� ���������� ��� �� �� �� ����� ����� ������� ��� ��� �����/�� �� �' �������� �� ������/������� ��� �� ��� ��������� �) ��� ������ ���������� ��� �)�� � �� ��)� ��/8 �� ��� �� ������������ �) ��� �� ��<�� ���������� ��� �)�� �� ' ����� A9�: ����� ����� ����� �� ��� � � � � ���� ����� �! �� ��� /�� -��) ��� �,�� �� ��� �)��� '

(�� ���� ���� �� ��������� �� ����� ��� ����� �� ��� ����� ���������� ��� �)�� � �� � �� ����� ��<�� ���������� ��� �)�� �� �� ��� -��) �� �� ������� ��� �/��� ���������� ��� �)�� � �@

� � F � � �� 9!:

3� ��� �� �� ����) �� �������) ��� ��� ��� ���������� ��� �)���� �� ��� ������ ����� ����� ��� ���� �� ����� �������<��� �'�'� ��� ��� �� � ���� �� ������ �� ������� ��� �,��� ������� ��� ������������ � �� �� ' $������<����� ���� �� �������� ������ ��� � ��/��� � ������ �� ��� ������� ������� �' �� �8 ���� �� ��� ���� ��� �� ����� !9:� ������ �� ������ � �������' � ���������� ��� ��� �������<���� �� ������ � �� ��� ���� ��� � �� � � ��� ��� � �������� �� � ���� ����������'(�� ��� �� � �� ������� ���� �� ��� ��,�� �� � � ����� ������� ����� � ��,�� �� �' �� �8 ��� ������ �������<�� �)�� �� ��� �� ����� A9�:'

(�� ���� ��� �� ��� ������ ��� ���� ��� ��� ���������� ��� �)���� ����� �� ��� �������� ������� ��� �� ��� �� ����� �� ��� ���������@ 1� �,��� � �� �� �������� ����� �� /�� ��� /�� �������

���� �� �� ���� �� �� �������<�� �)�� � �� � ��� ������� ��� ����������� �������<�� �)�� � �� �� � �������� ���� �� � �� ������ �� ������� ���� ��� ���������� �� � ' 3� � �� �������� �� �������<�� �)��

B

seller

rating>4name

pointer to V

price<$9K$8K

����� C@ ���� ���-��)

� �� � � �� �������� ���� ��� /�� ������� ��������� �� �' 3� � �,���� ���� � ���� ��� ����� ���� �� � �������� �� ������� ���� ��� ���������� �� � ���� ��� ��� ��� �������<�� �)�� ���� ��� �,��� ���� �'

�� �8 ���� ����� B9: ����� -��) �A ��� ����� ��� ��� -��) �! �� ����� !9:' 3� � �� ����� ��� ���� �� ������ A9�: �� B9�: �������,��)� ���� ��� ��������� ������ ������� ��� � � ����� ����� �� ����� B9�:' (,����� ���� �� � ��������� �� �� ���� ���� �������<����� �� ��=������� ����,�� ����� ��� ������ �������<�� �)��' (�� ������ �������<�� �)�� ��� �� � �,���� ���� ��� �������������<�� �)�� ����� �� �

���� �� ������ �� ������ �� ��� �������<�� �)�� ��������� ��� ��=����� �� ���

�� ����� �� ����� �� � �' ���� ��� ��� �������<�� �)��� �� � � � �����,�������'�����)� ��� ����� �� ��� � �� ���� �� ��� �� ��� � ����� �� �� �������<�� �)��� ��� �,��� ����

��� ����� -��) �' 1��� �������<�� �)�� � �� ������� ���� �� ��� ������ ������ ��� �� ��� �� ���� �� �'3� �� ��/�� /���� �� �� ������� �� ��� ���� �� ��� ���� ��� �� �� �� ���� ��� ��� ��<� �� ���

���������� ��� �)�� �� ������ �� ������ �� ��$; !&� ���� �� �� �� �������� � ����' ���� � ��� ��� � ���� ����� �� ��� ���� � �� ��������� ���� � �� �� ����' (��� �� ��� ���� ���� �������� �� �8���) ��� ������� ���� �� ���� ��� ��� �� �� �� ��� ���) ������� ���� �� �� ��� 8� � �� �� �� �������<�� �)��� ��

��

��� �� ��� 8� � ��<� �� �� �� �� ����� �� ��� ���������� ����)�� � �� � �

��

��� � ����� ��� �������<�� �)�� ���� �,� � ��� � �����'

� ��� ���� �� ��������� ������

"�,�� +3( �� ���� ���������� ��� �)�� � � �� ���-��) �� ��� � ���� -��) ��� ����� �� �8������� ��� �� �� ����� � ���� -������ �� �-��,���� �� ��� -��) �� � � ��� ��� �� ����,�� �� ��� ����� ����� �� �' (�� �,������ �� ��� *+� ,��� ���� �� ��� ����� ��� �������� �� ��� ���������� ����� �� ��� �� ��� �� �� � �� ��� �� ��� ��� �� � �� ,��� �,������ �� ����� ���) �� ��'1� ��� ���� -����� ���� ���������� ��$; !&� ����� � ���-����� ��� ����� �� ��� ������ ����� � ���� � ��=���� ��� ��� ��� �� ' ���� ���-����� � �������� ������ �� ��� ���� �� �� �� ���� ������ �� ��� ����������� ���� �� ' �� �8 ���� ������� ��� +3( �� ����� A �� ��� �� ��<��-��) �A� ����������� �� ��� ���������� ��� �)�� ���� �� ����� B9�:' (�� � ���� -��) �A� �� ������� ���-��) ����� �� ����� C'

(�� ���� � ���� -����� � ������� �� ?(3+6 �) ��� ��������� �������@ ���� ��� �� ��<������������ ��� �)�� �� �� ������� ���� ��� ��� �������<�� �)�� ���� �� ������ �� �� ����� ��� ���

�� �� ����� ���� ������� �������<�� �)���' (���� � ����� �� ������) �������� � ���K���� �� �������<��

�)�� �� � �� �' (���� ��� ��� � �������<�� �)�� �� �� �� ������� �� ������ ��� ����� � ' (��

�������<�� �)��� �� �� � ����� � ��� �������� �� � � �� ��� � �� �� ����� ��� �� � ���� -������� �' �����)� ���) � ���� ���� � ��� �� �� �� ���� ���-����� �� ���� �� ��� *+� �������,�'

��� �� ��� �� ��<���� ������ ���� �� �� ���� �� ��� ��������� � ��� �� �� �� ���� -����� ��� ��� �� ������ �� ��$; !&� �) � ��)��� �� �� �� ��� ��� ���� ��� 8� � ���������� ��� �)����<� �� $������ 2' (��� �����) ���������) � ��,�� ��� ���� ��� ������ �� ,���� ��� �,���� ���8������� �) G� ��H -����� �� ��� *+� -��) ������'

(�� ������ �� ��� � ���� -����� �,� � ��� -��) �� ����� ���� ���) � /����� �� � �,� �)�� ����,�� ��� �� ��� �� ��<���� �� ��� ��� ����������' �� ��� � � �� �� ��� �/�� ��� -��) ��

�8���� ��� �� �� ��� �� ��� �� ��� � �� ��� ���� �� �' 3� ��@

�� F �� � � 9A:

C

(�� /����� ������ ��� 9 : �� ��� � ���� -����� � ���� ���� ��� ������ �� �� � �� �� �� �8�������� ���� �����' 3� ��

�9 : F ��9�� 9��: � ��� 9 :: 92:

� ���� ����� ��������

(�� ����� ��� ������ ������� ��� �������<�� �)�� � �� ��� ���������� ��� �)�� � �� ��� +3( � ��� �,�� �� ���� ���� ��� �� �� ��� �� ��� �� �� � ��� ����� �� �' (�� �������� �� ���� �� ����� ��� �� ����� �� ��� �������<�� �)���' 1� ��� �57� ������� ���� �� ������������ ������� �5; &�� �� � ���)��' 4��) ��� ��� �������<�� �)��� � �������� �� � �,�� �� ���� ��� �������� �� �'

�� �8 ���� ��� ������� �������<�� �)��� �� ��� ���������� ��� �)�� �� ����� B9�: � ��� ��� ������������<�� �)���' L��� �� ��� �,� ��� ����� �� ��� � �� ���� �� ��� �� ������ ����� ���) �,��� �����A� �� ��� ����� ��� ������ ������� �����) ��� �� ��� ' 1������ ���� �� �������)� ������� ������ ������ �������<�� �)�� �� ��������' �� ����� �8������ *0��) �(3.1 !& �� �� ������� ��� ���������� �� ��� � ���,�� �� ��� �������� �������<�� �)��� �� �� �� �8������ ����� ��' 3� ������� ��

��@

��� I �� ��������9G������� ��H:!������! ������� I� �� I !��������� I !��� F G����H �� I�! ���� "F % �� I�! ���� # � �� I!����� " B� �� I � ��� I� �3� ��� ���� � ����� �� ��� ������ �� ����� ��� �� �� ��� ��� -��) �� �� ����� �������<�� �)�� ���������� �� � �,�� �� ��� � � �)'

!�������� �� ������ "���

1� �������� �)��� �� ��� ������ ������ �� ��� ���� �� *+� -����� �� *+� ������� ������� ���� �� ����<�� � +3(' 1� � ������ �� � ��� ������ ��� �)��� �� ���� ��� � ��,� ����� ��� ������� �� � �� -��) ������ �� �8��� ��� ���� ����� ��� �������� ���� ��� �57' ������� � ������ �� � ������ � ��� ���� �� �8������ ���������) �� ��� ���� ����� ���� �� �� ��� ������������� ��� �)��E� �����������' �����)� �� ��� �� ��� ���������� ������������ ��� ��� �)��� ������� �� ��� �� ������ ��� ���������'

# ���������������

1� ���� �� ���� M���� ?������������ �� ��� ���� �� ������� �� ;���� ;��� �� ��� ������������� �� ������ ��� �������� �� ��� ���� ����� ���'

��$������

�� !"#$ ���� ��������% ��� �����% ��� !���� !����� 1����������� ��� 2���&��� ��� ���� )�� �����)��� ����� )� ��������� � ����� �� �� �������� �������% 3""#�

�405�67$ ���� 4��% ������� 5� 0�������% 89�� �� 5����% 4���� �������% ��� ������� ���� � ����� ����������� ��� ������� ���� )� ��� ���� ����� % ����� ::";:<#% #667�

�==66$ (���� =����& ��� 5���� =�&�� ��������� '������ �& �� ����� ������� )� �������� �� ������ ���������� �������% ����� <>?;<6>% #666�

��/66$ 4���� ��� ��� �����& �� /��� � ����� ������� �� '���& ������� �� ��� ������� )� �� !% �����@@;>?% #666�

��A"#$ 2��� �� ��� 5�B��& 0� A������� 0� ,����� ��+& ������� �� ��������,������ ��� ������ )� ������� ����� % ����� #6#;3""% 3""#�

�1!""$ ����� 1��� ��� ����� !������� 1������ ��� ������� �� � ��+& ������ ����"��! ���������� �#��$��%�&% >-3.C#?>;#@"% 3"""�

� ��>>$ �� ������ )���������� ������� ��� ����+��� ������'��� �� ��������� �������� �&��� �% #6>>�

��)D�"#$ )�� �������% E�����& =� )��% ��� F� D���&% ��� 4����� � ����� G������� ���� )� ��'!(���������% 3""#�

��:/"#$ �:/� �2���&C � '���& �������� �� ���% 3""#� �:/ ������ 4���� �������� ������CHH�����:����H���H2���&�

����������� � �������������������������� � �"! �#%$&��� �(')# * +,!-$/."� 0 1

243/5�687:9<;=98><?�@BAC7�D�E�FHGI98>87:9IJLKM7N3B9I7:OPQ7:RS;COLKMTU7N9VKHW=X"5�WYTUR8GIKZ7:O\[^]:3_7:9I]:7=`

abW=OZ]:7NJLKZ7:O�c&WY@edVKM7:]f6I983B]�gh98JLKZ3_KZGIKM7C`SabWYOL]:7NJLKM7NO:`ji<DlkImon=kYpq @_3B]f6I7:9sr OZG898>I7:98JtK:uCvQ]:JNExwyR83zEx7:>8G

{H|I}L~t�f����~N�����������������C�N���������"�N���t�����Z�&���������t���������H��� ������x¡��(��¢�¡����L���y�����t£)����������¤¥� �N¦����o¡��Z�U�:���t�����Z�"¡������=�t�L����§�:���t��¤������)�t�L���z�����&���I�����j��¦§�C�N�z¡����:¡8�:���L��¤��N�o¡���¦§��¨Z�f�¡������©¡��L���������:���H¢e���ª�o���z¡�����«���¡��L�U�t�)£)��������¦§�L�N¡��(�������U���¡����H¬��t«V­¥®/�f¯j�L£��L�L°^� ±o���z¡������¥�:���L��¤:�²«������L�©�L�������H��¤o�³�¡��L¦´�t°M«��N�z�Z�&�N���N���t��¤&� �N�:¡�������¦§�t�:¡^�����s���L¯4����¡������µ¡��L���o������:���L�I�o�t£N�t�����=�L�s¢e�N�¶���t���f¡������C�����:���L�����L�L°f�����S����¡8�������������������¡��S¢e�N�^�������=����¡������4¡�����¦§�������=�f¯j�L�z¢e���N·s¸ ¹H�N���t�����Z�t­¬��y���t�C� �y�������=�N���§¡����´º����³¡»�������o¡����N�©¢e���(·"¸ ¹¼�:���t��¤�����o� �L���������s���������s�L�������Z�ª·&½&���L��¤�£)���t¯¾�L­^¿À�������z¡����t�������L°¯��"���L��� ����«=�"���\¡������4�C���=�t�4���Q·&½&���L��¤:�²«������L�y���t¦´���N¡�����t�N���������´�z¤o�z¡��t¦Á�t�������L�QÂ/��ÃS�²·&½»°:¡�����¡¾¯j�����M£N�&��¦§����� �¦§�t�:¡��Z�»¡��"���Z������¨L��¡����4�������=�N���L��� �N�:¡�������¦§�t�:¡S¦´��������������C���:���t��¤����t¯4����¡������´¡��Z���������:���L�L­&Ä����L����¦§��������¤�� ±o�=�t���x�¦§�t�:¡��8�t���oº���¦Å¡����j¢e�L�N�z��«�������¡³¤���¢C�����8���������:�����(���C�ª����������������z¡����f¡��(¡����»�C�L�z¢e�N��¦´���C� ���N�����C�¾���������t£f��«�����«)¤QÂ/��ÃS�·&½ �f£N�t��¡������������N�������Y·&½&���t��¤H�:���t��¤´�t����������­Æ ÇVȧÉIÊSËyÌ�ÍQÎ�ÉSÏ Ë»È

Ð ���y¡��Q¡����������f¯4�����Q�=��������������¡³¤¥��¢µ·"¸ ¹�°I¦´���)¤¥¬��t«�����������t��¡��������µ��� ¡������t£N�s���L�������L������¢e����¦§��¡������H¢e���N¦Ñ¦ª����¡���������Q·s¸ ¹%��������� �Z�ª«)¤U�����z�������©�:���t�����Z�§���N�����C�³¡§���L¦§��¡������¡��¥���������t�L�ª�N� ���N����¡�����¿À�:¡��t����� ¡Z­�®/�f¯j�L£��t�Z°S�o�t���M¤��Z�����¡���¡������C�z¦§���������N���µ�����\���N�z¡/���f¡���������Òf���N�L����¢Ó¡��t� �����o��o�L�ª�:���t��¤:�Ô�L¦ª«=�L���o�Z��¬��L«U�����������t�f¡����N���"¢e����¦Õ���t¡������L£N������Ö�������¼�o�L�������Z�,���L¦���¡������o¢e�N��¦´�f¡����N�×���¼���×� Øy� ���t�:¡¦´�������t�Z­®¾�L��� �N°\¡����<���o�Z�Ù��¢ÛÚ Ü�ÝtÞhßfà�á�âMãtÝ�äæå�â�å çNèÓé)êÒ)���f¯4�\¢e���&��¦§�����f£)�����´�=�t��¢e����¦´�����t��«)¤��N���o�L���µ��¢�¦´���������¡������H���U���L����¡����������j�o���z¡�����«���¡��L�Ö�z¤o�z¡��t¦´��ë ìo°^ífî4� �N������������s�����f£)���o�4�"£:����«����¾�������o¡�������¢e�N��� Øy� ���t�:¡�·"¸ ¹ �:���t��¤�����o� �L��������������¡�����¬��t«Q�t�)£)��������¦��L�:¡L­ïHð)ñ ��~ ð)òÑó�ô �MõS�öÂ/�y��¢e��������¦§�t�:¡����4¡��Z���������:��� ���o��o�L����¤)�����ø÷Zè_Ý ùSà�á�âMãtÝ�äúÚ Ü�ÝtÞhßbâ�é�ãhù�ÝtÞhèÓé:êúûe�N�Åã Ýtü´âféCýBè_åå�â�å çNèÓé)êZþh°ÛÚ Ü�ÝtÞhßæå�ÿfé=ýÔâ�èÓé�ü´Ý é=ýH�C���Û«C�L�t�Ñ�t±:¡��t������£��t��¤�z¡����o���L��¢e���"���t���f¡������C���^�t�����³�����h¡���£N���:���L�����L��ë ��°����)°����Mî²­Ã�£N�t�,¡��������N�,¡����¥¡��N�����¥��¢(���t¦´���:¡����¥�L����������� ��¢(¯j�L«

� ��������������������������� ���"!"#%$&��'(�����)!�*,+-!"��#/.�0,12. 3�4�5��6��'7!8 4:9�4<;�=�>@?7A�B�C�;�DFEG�GH���#I'J�K����L�$JL�����#�!"�M!"����'��N4:OKPRQS���F!"��#�4:OKPT ����� ���6��!"#UQV#IL�L��%�W�"�����GD

�:���L�����L� �C��� ���L� �L�:¡���¤ �N�������Z� ¡����Ñ�f¡�¡��L�N¡����N� ��¢,���t����L���������t���t°s�����z��¦§�o¡����N��� ���������z�C������¤<¦§�N�o��¡�����¡�¯j�L«�:���L�����L�§�����Q�z��¦§������¢e�N��¦��²«������L�Û�:���t�����Z�t­YXµ�����´¦§�L���C�¡����t¤ �L���Ñ«=�<�L�x¡����t� ¡��������z���f¡��L�ú���:¡��æ���t���L� ¡������[ZC½s¹�:���L�����L��¯4��¡��×����¦§�����Q�����Z�o���t�f¡��L�§�f£N�t�»¢e����¦l�f¡�¡�����«���¡��L�ëS�,\�î¶���µ���L�o���t�L��¡��y�»¡����N�o�x¡����N����� Ð �f¡��������§�:���t��¤H�L£M���������¡������»¢e���S����¦§�����4«C�)�N���Z���§�N���t�����Z��ë ]fî²­UZ)�����´���§�����z��¦����¡����N�����^���L�������C��«����S¯4���t��¯��t«»���f¡��¾�z�N����� �L�¶�������³¡������L�"���«C����Ò:�Ô�L�������L����¡��������������f¡���«��N�z�y��¤o�³¡��t¦´������¯4���t� ���������¯��t«"�������Z�V�����S«C�L�����¾¯4�������=�L�&¡��¾�t±o�C�:�z�S����¦§��¡��Z�(�:���t��¤:������´�t������«������x¡����Z�µ£)���§�»¢e����¦��Ô«��N�z�Z�y���:¡��t��¢_���t��­^ ¡����t��¯�����Òª¡�����Ò)���Z��¡����"�N���t��¤��t���:¡�������¦§�L�N¡µ������«����t¦¢e�N���z�L¦§�x�À�³¡����C�h¡������Z�<����¡��%ëS��°_�MîÔ­ ÄS������Ò����C�³¡����:¡���������t¡L­×���Ô­ ë`�fìLî¾���M£�� �N���o���L���z�Z� ¡����Q�:���L��¤Û� ���:¡�������¦§�t�:¡�����(���t¯4����¡������/������«����L¦ ¢e���I¡����JX^���L�NZo�C�Z� ��ºC�t��¡������§¹¶������N�����N�»û"X/Zo¹Iþh­aXµ���t¤H���t���t��������¨L�/¡����"���L����¡����������Y�t���:¡��������¦§�L�N¡V¦´�����������4¡��Z���������:���/ë �Mî�¡��¾�L�z¡���«��������s¦´�����������N� «=� �¡³¯��t�L� � ��¦§����� ±©��«b�³�L�h¡»���f¡�¡��t��� £f��������«����L�"��� ¡³¯j�cX/Z�¹�:���L�����L�L­edI�������L��� �Ö�t¡L­,���Ô­Û�����f¯��L�U��� ë f�î4¡�����¡´�:���L��¤�t���:¡�������¦��L�:¡4¢e���/�§�������N�o�B¢e���t�N°����L�N�f¡����N�o�Ô¢e���L�"����«���� ¡&��¢Z)¡����=½s¹Q�����o�L�t������«�����°N�������t£N�t�hg/Ä��²�t��¦§����� ¡��µ¢e�N��������������xºC�L���:¡S�z��«C�z�t¡8��¢�¡������8�������������������L�z¡������h¡������&¡����µ���t�N��������C�f¡��´�t±o�����Z�����������I¡��ª� �N�N¡������§������¤jis�N������«=�t�C�t�����z¡����:¡��L­®/�f¯j�L£��L�L°���� ¯j�N��Ò ���,£:���t¯µ�²«��N�z�Z�Û�:���L��¤Ö������¯j�L�������¢e�N�y�z�L¦§�x�À�³¡����C�h¡������Z�Ö����¡��������§���)£��Z�³¡����:�f¡��Z�U¡������§�����z���¢e�N���z¡��f¡�� �²��¢Ó�Ô¡����t�²���z¡S·"¸ ¹¥�:���L��¤»���������������Z�S�z�����´�N�S¡�������L�t�t�:¡¾¬k�:�-�������=�N����� ��¢S·&½&���t��¤�ëS��fMî²°=¯4�������Q���/�L�������«�������¢�� ±o�����L���z�������t��¦§����� ±¥�z¡������h¡��������8���f¡�¡��L����¦´�f¡�����������C°l�³�������s��¢�¦»���x¡��������§·s¸�¹ ���f¡��\��������� �Z�/���C�¥�N�o¡�����¡���L�z¡������ ¡�����������­ g/�s���t¦´���:¡����j�t�����������s�z¤o�z¡��L¦Ù�o�t£N�t�����=�L�¡��)����¢_���4���N�µ«C�L�t�\¡������N� ¡������§·&½&���L��¤N­·&½&���t��¤´���4���L�M£)����¤y«��N�z�Z�H�N�\¡���������¡���¡������\��¢¾Ýnm,oCÞ�Ýhãhàãhè_ÿ�é�¯4�������U¦´�M¤©«C�\���Z�³¡��L� ¯4��¡��Ö¢e�����4�N�t���L��������¡³¤�­pd��N��t±���¦§������° �H���L�z¡��L�¥·&½&���t��¤Q� ±o�����L���z�������t�������:¡��t�����Z�M£�����L������¡§�t�����z¡������ ¡������Ö¯4��¡�� ¢e����¡����t�´����¡z¡��L��� ¦´�f¡����������N�L°¯4�������"���V����¡V¡�����£)�����f¡��¾«=�����Z�o���t�L�s¡��&�4���f¡����������²����ÒN�I��������N�����N������ÒN�qX/Zo¹�ë`�fìLî²­IÂ/������°M¡����j���t�N�������¶�t±o�����Z����������¡³¤)�=�¢e�N��¦´��¡����s¢e�N��������¡�������¢e���µ·&½&���t��¤´�z�L¦´���:¡����t�L°)¯4���t���L�N�X/Z�¹y�o�)�L�^����¡8�z�����C�N�z¡^���L���������V�C�f¡��ª� ±o�����L���z�������L­WZ)��¦§�x������V¡��»ë fMî²°f¯j��������¤��o�L����¯4��¡���¡����j�t�����³���C�h¡���£��N°M���t�:�f¡��������

¢e���t�4�z��«��z�t¡S��¢C·&½&���t��¤�°f¯4���L����¡��������t�N�������8���f¡��»�t±o�����Z�³������N���������»���L�z¡������h¡��Z��¡�� �����������f£��L�/¡����Q����¦��Z���������L�x�¡����t� i��N� ��� ­ ^ �Ö¡�������¡����t�H�������V°j¯���� �N���z���o�L�§¡�������t¯4���x¡������¥��¢¾�����L¯ú·&½&���t��¤���� ¡��t��¦§�ª��¢4¡����H���L�z¡������h�¡������Z�-£)���t¯Õ�������L¦§� ��¢�� � ���:¡������������,�:���L��¤N°&¯4���t���L���ë ffî^�o�)�Z�µ����¡/�o�Z��� ¯4��¡��Q¡��������Z�³¡����C�h¡����������´�t������«������x¡³¤y��¢Z:¡����=½s¹�­ d�����¡����t��¦��N���N°�ë ffî�¢e�o� �C�z�Z�8���»¡����µ¢e���C����¦§�t�:¡�����o�Z� ������«������x¡³¤s��¢��:���L��¤"� �N�N¡�������¦§�t�:¡V¢e�N��Z:¡����=½s¹�­ ^ ���¶¢e���� �C�S���C�³¡��L���»���8¡��������f£)������¡����µº����³¡�������� ¡����t���)¢e����¦��L¯j�N��Ò¢e���&�y�t�N����� �À�M¯������&·&½&���t��¤�������¯j�L�������´�z¤o�z¡��L¦\°C���C� �����)������Ö·&½&���L��¤Û����� �²�����o�t�L���z����� «)¤Û������¦§������¨L��¡������V°µ¡³¤)�=� ��t���������t�L�Q·&½&���L��¤Q� �N�:¡�������¦§�t�:¡/¦´�����������H���C� ���L¯4����¡z�������°:�����§�t�N�����4¦´���������L¦��L�:¡L­^Â-�z�L¦§���:¡����4�t�����������(��¤o�³�¡��L¦ �t�������L�UÂ/��Ã��Ô·&½���ë �fî����N�ª«=�t�L�Ö��¦§�����L¦§�t�:¡��Z� ¡�����L������¨t�s¡����������N�C�:�z�Z�´¡��Z���������:���L�L­� � Ë�ÉSÏ��sÉ8ÏLÈ ��������������

Z)�����C�:�z�s¯j�&���M£��¾¡������t�N�����L�§·&½&���t��¤����s���C�§¡����"���L¯�:���t��¤������������f¯4�H��� dI���������(�N­ �j��¡����!�"�����"�»���)£��N��£N��$#:�³�N���&%���¢&¡³¯j��·s¸�¹Å���)�t��¦§�t�:¡��§¯4�������,� �N�o¢e����¦ö¡��¡����"¡³¯j� Ð X Ð �³¡����C�h¡������Z�4�z���f¯4�\��� dI���������-�o­1('�9*)�* 4:.+)�*���*�C�,�* �,����-.)�� 4:.+)�*/,021 L ���)!4365�77�98J#I�:59;E=<<*)n�?> 3?@V1A'�9+)�*.B�4:.C)���#4D���# � �6C�,�* �,���

E F <�9�< )�*GB ,n!"� !"L�#H3 )�*I,n!�� !"L�#9J<FLKW9W.NMPOQ9RTS�UWVYX�*GB.M[ZHOQHR\S�U]VYX:^

9J<FLKW9W.M_Qa`.V[O9b2XJ)�*I,cW+7#n���\-�)�*/,n!"� !�L�#-.)��2-MPOQad�e[Q\f RaXg 1('�9*)��_4:.h)n�6,��"#%�"��L !9J<FJK�9 .+)��I,���#4D���# �hiMjZHOQad�e[Q\f RaXMjZ9Qa`.V[O9b2X kml

1('�9*)�* 4:.+)�*���*�C�,�* �,����-.)�� 4:.+)�*/,�����!"�����E F <�9�<")���,�L ���)!4365�77�98J#I�:59J<FLKW9W.C)�*/,n!"� !"L�#�-/)��2-MjnIopdpqHO9osr4UWQ OQad�e[Q\ftQ\O�RHXg 1('�9C)�*GB�4:.+)���#4D���#I� ��C�,�* �,����1 !�� !"L�#a3 )�*/,n!�� !"L�#4;[-)���4:. *GB ,��"#4D���# �E"F <<9J<")��4,�����!�#vu�=

9�<<JK�9 .C)��4,��"#TD���# �K#I� iMjZnIopdpqHO9osr4UWQ OQad�e[Q\ftQ\O�RHX wdI���������j��xj���������L� ½&���t��¤zy �(����� g/�t¯|{/���t�&½&���L��¤Q½���¾��� ¡������t£N�L�8¡����~}(�=�.�Y° �����&�.�C°����=�G�&�.�s�������G���2�(�.�ª���o�¢e����¦´�f¡����N�¼��¢�¡�����«=�)��Òo�H¢e���N¦��!�2�����I�!�U�f¡\�Ö���L¦§��¡�����x¡��yû[�!�2�(�§���t����¡��L�4¡�����£f��������«����&«=��������¡��§¡��������)��¡4�L�x��t¦§�L�N¡&��¢ �!�2�v���/�!�)þ�¯4���N�����/�A�G�&�.�y�����G�����(���H�����¾¡�����G�&��� �C��¦§�´��¢�#H�N��¦§�L�H%�°V������¡����z�(�.���/�/����«C�N�o¡ª�z�C���«=�:�NÒo�"¢e���N¦Õ������¡����t�(���t¦§��¡��y�o�o� ��¦��L�:¡��(�����/�����t���/�!�ûÓ¡����(���)��¡4£f��������«����"�����(�.���/���&�oþ�­��§��� ¡������t£N�L��¡�����«C�)��Òo�¡��C�f¡&�����(���o¡����N���Z�\«:¤\�§�C�L�������Q¯4��¡�� ¡����(�����z¡¾����¦§�(��¢#H�N��¦§�Z�a%(�����y�����N����¤´���f¡��L��û_����¡��.�J\)þ8«)¤§¡����&���t£)���t¯��t���t­� '����q�)+��)!�#T8 �K��� ���"#TD��������"L + T ��L�L�#I$� WHK� T ��#-<*���! ����� * #I#%'

��#I'��98J#I$��"��' T #M!"����!�'��98J#�������* #I#I'-��#I5����"!"#I��#I$�� � !��-�J$�� ¡ #I��#I'7!���"��$�� T !���L���#%��$�+

bib

book*

@year title

author+ editor+

publisher price

last first affiliationlast first

reviews

book*

title review*

commentsrate reviewer?

dI���������-�AxS«���«V­ �)¡��Q���C�����t£)���t¯¾�L­ �o¡��

¿À�"¡������¶�t±o��¦§�����N°M¯j���L���ª���t��¡�����¡¶¡����j���C�z¯��t�V���L�:�������L���¢ ������¡���¡�������¤H����«�����¦§�L�\����¡��C�f¡¾��¢6����­8¿À�\¡������µ�����C�L�L°¯����zÒN� ¡������N�o¡4¡�����«C�������L�µ��¢8�����4·&½&���L��¤�� ���:¡�������¦§�t�:¡���������L¯4����¡������´�³¡�����¡��L�����L������������¡������¾���������������t±���¦§������­

¢ £¥¤ ��¦¨§©�«ª­¬¯® ¦¯���QÊSË°�Î�¤

Xµ���&«��N�z���¾���o�Z�ª��¢¶�����4���������N�����´���j¡���� ±o���������¾¡������ �����¡�������¦§�t�:¡Q¦§�����������N��«=� ¡³¯��t�t�-¡�����£f��������«����L�\�z�=�L�t�xº��Z�����¡����§¦´�f¡����������\�C�f¡z¡��t�����"��¢j¡³¯j�\·&½&���L�����L�&¢e���(�:���L��¤�t���:¡�������¦��L�:¡&�o�L�t���������¥����� ¢e�N�s�:���L��¤\���L¯4����¡������C­�Z)�����t����H·&½&���L��¤´�o�tº����Z�µ�x¡��µ£M��������«����Z�����������»���t�N��������� ±o�����L�z����������L°»�N���©� ���:¡�������¦§�t�:¡�¦´�����������Å�z¡����f¡��t�N¤Å�����t�����C������f¡��L�8�&¡³¤)�C�t�Ô���t���f¡��L���N���t��¤ª��������¤o�����L°����§������¡���� �������I¡³¤)�=����o¢e�L���L��� ������������«o¡³¤)�������¾���t���f¡������C�z�������L°t¢e�N�V¡������������=�N�����¢�¦´�f¡��������������t�N�������z�²� ±o�����L�������N�o�B¡³¤)�=� �²«������L��£f��������«����L�L­X^��¢_��� �����x¡��f¡�� �:���L��¤ ���L�N�z�N��������¢e���´�t���:¡�������¦§�L�N¡´¦´������������C°o¯j�sº����z¡µ�����N�C�:�z����������¦´������¨L�L�y¢e����¦ ¢e���4·&½&���L��¤�:���L�����L�L°M¯4�������(���t�����^¡��"���t��������¡��S¡����j�C�f¡z¡��t���ª¦´�f¡�������������t¦´���N¡����L�¶��¢C�/�:���L��¤s¢e����¦%��¡��I���L�z¡������h¡����������s�z�L¦§���:¡����t�L­

Query Parser

Query PatternRegister

Query Rewriter

View Results

V1 V2 Vn

Semantic Regionss1 s2

sn

XML queries

querydescriptors

probequery

remainder query

XML XMLXML …..

Replacement Manager

Query /Cache Matcher CacheManager

Query Interface

QueryDecomposer

Query Containment Mapper

Result Combiner

RegionCoalescer

S1 ∧Q 3

S1’=S1 - S1 ∧Q 1

S2 ∧Q 3

S2’=S2 - S2 ∧Q 2

srID timeStmp invlRatio

query/cacherelations

s1s2sn

±6²P³.´(µa¶�·&¸º¹v»(¶�¼�½¿¾ À�ÁÃÂ|¼Äµ9Å9»(²PÆa¶2ÅÆa´&µa¶¹v»&¶�ÇȵHÉ�Ê�¶pËmÌ�µHÍ�Ì/Ç ÆH»(¶�¼�½¿¾ À�ÁÃÂÏÎ\ÐAÎ\Æa¶pÊѲjÎ�ÒA¶sÓ(²jÅÆa¶2Ò

²_ÔC±6²_³�´(µH¶°·&ÕÖ¹v»&¶�×�ØJÙGÚ�ÛÝÜ�Ù=Þ.ß�àNáºß�â2Ù.Ú�ã&µ9Î\ÆÃÉ/Ó&Ó(äP²_¶sÎÔ&Ì�µHÊ�É�äP²_åsÉ/Æa²_Ì�Ô+µa´(ä_¶sÎ�ÆaÌ*É/ÔN²_Ô(Ó(´AÆ�ÁÃÂ~´(¶pµHÐ�ÆHÌ*ÒA¶pµH²Pæ.¶zÉÔ&Ì�µHÊ�É�äP²_åp¶2Ò�Ô(¶2ÎTÆH²PÔ&³�ÇÈÌ.µaÊçµa¶sæ�¶sÉ�äP²_Ô(³�Æa»(¶ÄèG´(¶sµaÐ�é ÎtÊzÉIÆHÅ9»(À²_Ô(³�Ó&É/Æ\ÆH¶pµHÔ&Î ÆH»(µaÌ.´(³�»�ÆH»(¶~ÎaÓ­¶2Å�²Pã&¶sÒ�æIÉ/µH²jÉ/ê(ä_¶sÎ É�Ô&Ò�Æa»(¶s²PµÒ(¶pÓ�¶pÔ&ÒA¶sÔ&Å�Ðë»(²P¶sµHÉ�µHÅ9»=Ð.ÕLì�ÆvÇÈ´(µaÆa»(¶sµ�µa¶s³�µHÌ�´(ӭκÆH»(¶�Å�Ì.Ô&ÒA²PÀÆH²PÌ.Ô&ÎJÉ�Ô&Ò�µH¶�ÆH´(µaÔ�¶�íAÓ(µH¶sÎHÎ\²_Ì�Ô&ÎJÎ\Ì~ÆaÌ�Åp¶pÔGÆa¶sµ�ÆH»(¶pÊÏÉ�µaÌ.´(Ô&Ò

î

¡����t���Ö��� ¢e�t��������� ����¡z¡��t���b£f��������«����L��¡��%¢e����¦�£f��������«���� ����C�Z� ��ºC� �z��«=�:���L�����L�L­ Xµ������� ð �����&�o~t~ ð �� ïHð� � }��~ ð �&���z�Z�I¡����4�C�f¡z¡��t���§£M��������«����Z�t°M¡����t�����o�L�C�L���o�t�C� ¤ª���t���f�¡������C�z�������¾�N�4¯��t���¶���4�N�����o� ���f¡��Z���t�������x¡����N���¾���C����� ¡������� ±o�����L�������N����� ±)¡������ ¡��L�æ¢e����¦�¡����-�o�Z� �N¦��=�N���L�b�:���t��¤¡��¼�t�����z¡������ ¡ ¡����Ö�:���L��¤Å�z�L¦´���:¡����t�����L��� �����o¡����Z­ Xµ����:���t��¤�� �V���Z�³¡������ ¡����������/���t¦´���:¡����t�¶�����������z�¾�L���o¡������L�����t�o������f¡��t��¤´«)¤H���Z�³¡��L�����L����¡���������� ���Z�³¡������ ¡����������»�N�C�L����¡��N���L­d��N�ª� �������ª��¢4���t¯ ����� �t�N�����L�©�:���t�����L�L°¶¡�������� ð ���

� ô �=~Z� � ��� ð �=~�������� ð �ª� ±o���������L�µ� �N�N¡�������¦§�t�:¡4¦´���o��������N�"«=� ¡³¯��t�L��¡����L��������¡z¡��t����£f��������«����L�L­§¿²¡(¦´��Ò��Z�&¡�����:���t��¤ � ���:¡�������¦§�t�:¡¥�o�L�t���������Ù�o�L�C�L���o�����-�N�%¯4��� ¡����t������ �Ô¡����²������� �N�N¡�������¦§�t�:¡8¦§�����������N�8�L���(«=�µ�L�z¡���«������z���L� ­Xj¤:�=�4���o¢e�t���t���t�¾�����§����«o¡³¤)��������¦§�L�����������z¦´���������=����¡��Z�«)¤Á� �z¡���¡���� ¡³¤)�C�������L��Ò��L�,��¢ ·&½&���t��¤ �����%�o¡�������¨t�L�¢e���§¡������´� ���:¡�������¦§�t�:¡´¦§�����������©«C�t¡³¯j�L�t�Û¡���� ���t�N���������� ±o�����L�������N�o�B¡³¤)�=� �²«������L��£f��������«����L���z�=�L�t�xºC�L�Û���Ö¡�����¡³¯j��:���t�����L�L­m�µ�����L�\�N��¡����(�Z�³¡���«����������Z� � ���:¡�������¦§�t�:¡¾¦´���o��������N�L°o¡�������� ð ��� ï�ð�� � � ~ ð �����t¯4���x¡��L��¡����(���t¯ �:���t��¤¯4��¡��b���Z�z�=�L� ¡�¡��Å¡����,�=�N���z��«���¤Ù�z�L£��t���t��¤%���L�z¡������h¡������Z�£)���L¯<�z¡������h¡������L�µ��¢I¡����(� ���:¡��������������L�������Z���:���L�����L�L­Xµ��� ïHð � ñ ��� ð � ð �=~��Ù� �^� =ð �Õ���C� �����=�����f¡��Z�Ù��� ��������t�t¦§�t�:¡ª�z¡����f¡��L�����L�&¢e���ª�������������Q«C��¡�� � �N¦������t¡��y���C������z¡����������t�N���N����¡��©¦§��Ò��\�z���N� ��¢e�������t¯ø�:���t�����Z�t­ d����«=� ¡z¡��t� �t�������¼�z���N� �×�o¡�������¨L�f¡����N�V°�¡���� ï�ð�!�eô �"� ô ���ñeð }Z� ð �¾¦§�t�����Z�¶���C�ª��������¡��I£)���t¯¾�^¡��"� �N�N¡����N�����t�N���N�(�������o����������x¡³¤H�f£��t��¡���¦§��­Ð ���§¡�� �������t�»����¦§��¡���¡��������L°V¯��»¢e�o�t���"����¡����´�o�Z���t�����o�¡������b��¢H¡����×�:���t��¤Ù�o�Z� ��¦§�=�N���x¡����N�æ�������t���:¡�������¦§�L�N¡�z¡����f¡��L�����L��«��N�z�Z�y����¡��������:¡����o�o���t�L�y¦§��¡���£M��¡������§� ±���¦��������­,¿À�:¡��L���Z�³¡��L�Û���Z���o�L���§�����\���t¢e�t�����Z� ¡��¼ë \fî¾¢e�N�§¦��N����o�t¡��������ª��«=����¡ª¡������ �N�:¡�������¦§�t�:¡ª¦´�����������¥���C� ���L¯4����¡z������´������������¡���¦´�µ�����L����� Â/��Ã��Ô·&½»­

# ¬¯®øÍ �¾Ê%$'&ë��� (È�ÉSÏ Î)(¨¦�È ���$*(IÏ+(, ��� ð ��� �"� ð �-�¶� ô � ð }Z} � � /.0� �/1 ô ��©� ñ2�43 �o~ �eô ���X^� ¢_��� �����x¡��f¡���¡���� ���L�N�z�N������� «=� ¡³¯��t�L��·&½&���t�����Z�t°¯�� � ±o��������¡ éYÿfÞhü´â65 è87Zâ�ý_è_ÿ�é �������Z��¢e����¡��������z¢e����¦»������ ���·&½&���t��¤ ����«��x¡�����������¤ �t��¦§�C�:�z�Z� ��¢:9<;>=<?@A9 B? C�;ED�F�C=<G�D�?ED C>?EDEF�H<?�I!JU�t±o�����Z�����������©���:¡��Ù�Å¢e����¦¯4�������Ù�z�L�������f¡��L�Q��¡��¥���f¡�¡��t���%¦´�f¡����������¼����� ���L�z¡������h�¡����������%�z�L¦´���:¡����t�U�����æ� ±o�������t�x¡���¤æ� ±o�C�:�z�Z��¡����×���:¡��L�z��o�L�C�L���o�t�C� ���L�4«=� ¡³¯��t�L�H¡����N���"¡³¯j�´���t¦´���:¡����L�t­Xµ����·&½&���t��¤/���N��¦´������¨L�f¡����N�&�C���V«=�t�L�(���������Z�����L�s«=��¡�����Û¡���� ¬k�:�Ñ�������=�N������¢e���´·&½&���t��¤©¢e����¦§���¾�z�L¦´���:¡����t�ëS�LíMî�������«:¤�¸ �����N���Z���t�¥�t¡L­´���Ô­ ë`���ZîS¡��Q��«�¡������©���©���o��������������f¡��\·&½&���L��¤©¢e�N��¦ö¢e�N�´¯4�������Û¡����Q·&½&���t��¤:� ZC½s¹¡������C�z���f¡����N�H�t���y«C�s�����x¡�����¡��Z� ­�Xµ���s������¦´������¨Z�f¡������´�������L��������=�N���L�¥«:¤�¬k�:�Ù�������C�z�Z��¢e�N�"¡��������z¢e����¦§����� d8¹�¬LK� ±o�����L�������N��������¡����s¢e�����V·&½&���L��¤y��¤)�N¡��f±´���N¡���¡����L���¾����¦�������xºC�L�<¢e����¦ ���Å¡����©·&½&���t��¤<�j����� �z¤)�:¡���±�ë`�ZíMîÔ­ ¾���

¡���������������¦§�����&�f¡^�o�8MY�t���t�:¡¶���:�����t°t¡����Z�z��¡³¯j�4������¦§������¨L���¡����N�(¡��L���������:���L�8�o�"����¡I���L� �Z������������¤"� ±��t���������Z�����»��¡����t�Z­{&�z�����»¡����"���N��¦´������¨L��¡������H�������L�������f£)���o�Z�y«)¤�ëS�2�Zî²°o¡����

;ED�FH«=�������Q£f��������«����Z�¾�t��� «C�»�t����¦§�����f¡��L� «)¤Q�z��«��³¡��x¡��o¡z������"¡����t���j�o�L� �������L��� �Z�I¢e���j� ±o�����L���z�����§�o� ºC���x¡����N���L­8Â/������°¡�����9E=�?Q�t±)�����Z�������N���/¡�����¡����������Z�³¡��L��¯4�x¡������©�N9�B>? �N�=<G�D�?�DQ�t�����������L����«C�´�������L�z¡��Z�����������o¡(�����z���o�§��¢�¡����?�D�F�H�?EIQ� �����C�z�N­�K4�L£��L�����t��¤�°V��?�D�F>H�?�I � ���������´� �N������«=��������L�z¡��Z�»�N�8¯j�L���Ô­�d����S�N���8���:¡��t���L�z¡8����¡����µ·&½&���L��¤»� �����¡�������¦§�t�:¡V������«����t¦Q°Z¯j�j�z�L���Z�h¡ ¡������������Z�³¡������¾�������S¢e�N� ¡����9 B>?y���C�*=<G�D�?�D§�t���������L�µ¯4������� �z��¦§������º��Z�µ¡����L¦Ñ¡��y� �����¡������Ö�N����¤ �C�f¡��Û� ±o�����L���z�����C�ª¯4�������\�z¡������¾�L���o¡�����������¡����9E=<?§�����t���������:¤§���C�Z� ��º��L������¡����&�N�������������Y�:���t��¤´«)¤´���L�z¡z������O9E=<? � ±o�����L���z�������&������¤�¯4��¡�������¡����P?ED�F�H�?�IQ�t���������L�L­Â/�P9E=�?ª� ±o�����L���z�����§�f¡j���)¤ª���Z�³¡����������t£N�t����¢V�z�����´�s¢e�N��¦�L���\«C�����t�����Z�z�L�N¡��L�����µ«=�t���f¯�x

9 B>?RQSRT I QU û�QV þW=<G�DE?�DYXYû QVEZ þ[?�DEF�H�?�I\? û QV�Z þh°¯4���t����¡����©������¡³¤,��¢]QS ����¡����©����¦§���N�y¡����f¡ ��¢ QU û�QV þ�­Xµ���(�o� º����x¡����N����¢^QS «)¤ QU û�QV þ4�L���Q«C�ª� ±o�����C�o�L� �N�)9 B?_ S � T IY`ED��:ûba+a�adcV�e�f-gbh þ�° _ S ijT IL`�D�kCûla�a�adcV�e�fdgbh ° _ S � þ�°�mnm+mx° _ S!oT IL`�DEpVû a�a adcV�e�f-gbh ° _ S � °qm+m+m�° _ S!o�r � þh°�¯4���L���O`�D��Q���t����¡��L�¡��������f¡��¥� ±o�����L�������N�������L� ¡����o�tº����§£f��������«���� _ S s °V�����a+a�adcV e�f-gbh ���L�����Z�z�L�:¡��§¡�����£f��������«����L�H�o� º����L�×���ס������N�o¡��L�9E=<?´� ±o�����L�������N����¯�­ �Z­ ¡Z­8¡����(�t�������L�N¡t9E=<?y� ±o�����L���z�����¶­d����"� ±���¦������N°^�o���������\¡����§���N��¦´������¨L�f¡����N�¥�����o� �Z���s¢e�N�

���N°�¡����u;�D�F»«C�N�����y£f��������«���� _�v ����º����³¡��t����¦§�����f¡��L�y�������¡����o�t�t�������L��� �y���Û¡����\���L�z¡��Z�R9 B>? � ���������H�����L�U¡��©�o� �ºC��� _6w ��������«��z¡���¡��o¡��L�¼¢e���\�x¡��*9E=<? ��� º�����¡������¶­ Xµ���L�¡����H�������L�z¡��������������H���������������L�©¡����o�L����£��´¡����H���L�:�������L�������¦§������¨t�Z�´¢e�N��¦Ñ��¢6���N°����4���t�����h¡��L�\���2dI����������\&x

1A'�9+)�*_4:.C)�*���*�C�,�* �����/-.)�� 4:.+)�*/,0s-�)n!W4:.h)�*I,n!�� !"L�#E"F <<9J<")���,�L ���"!�365�77�98J#I�:59�<<JKW9W.M_Qa`.V[O9b�XJ)�*/,cW+7#%���a-/)n!a-�)��2-MPOQad�e[Q\f RHXg 1('�9*)�*GB�4:.+)��"#4D���# � �6C�,�* �,���

E F <�9�<")�*.B ,n!�� !"L�#¿3c!9J<FLKW9W.C)�*GB ,��"#4D���# �hiMjZHOQad�e[Q\f RHXMjZ9Qa`.V[O9b�X

dI���N������\­x�Xµ���-g¾����¦´������¨L�L� d��N��¦ ¢e�N��y �

¬��µ�����z�¾�����f£)���o�j�N���o��¡����������N�������L�¶¡��/�L����¦§���C�f¡����������L�h��Z��������¤�9�B>?�� ±o�����L���z�����C����¢V¡����"�o�tº����L�H£f��������«����Z�j���ª����¡�o�L� ���(�L�����t¯4���t����°^������¡��������x9 B>?¥���C�]=�G�D�?�D¥� �����C�z�Z�¡��§Ò��t�L��£f��������«������o� ºC���x¡����N����¢e���t����¢^º���¡��t�/� ±o�����L���z�������sxynzA{ )9ou|b}Ä)�*~����2�-�+�A���<���4�-�b���+������� 3E�

ynzA{ )9ou|b}Ä)�*6~�������-{�� )9on~��-�+�A���<���4�-�b���+�����

ynzA{ )n�|b}Ä)�*~A�����n�n�d�l�+�-���2�-�+�A���<���4�-�b���n�����2�+~d���l�n�A� 3E�ynzA{ )���|b}~)�*~A�����n�n�d�l�+�-��-G)n�|b}Ä)��~d���b�n�A������-{�� )���~��-�+�A���<���4�-�b���+�����

� �o� � ��| ñeð��\ð � ð � òIð �¶�>���ö¬����C�z� _����������� � ��a�c _�� ¡���o�L����¡��/¡����f¡µ£f��������«���� _�� ���µ�o�t�=�t�����t�:¡��N� _������� £)���(¡�������M£)���N�f¡����N�y����¡����I�&�=���Y­�Z)��¦§����������¤�° _�� �a c _�� ° _�� �� s ��� ga c_�� ����� _! �" V ��"�v$#%� ��� � ��a c _�� Z ­ Xµ��� ooâ�ýBý²Ý ÞhéÅü´â�ý²åhç:èÓé:ê���t¦´���:¡����t�y��¢�� �N���t��¤Û����� �t���o¡������L� «)¤U¡����¥���C�Z� ��º��L�£f��������«����L�µ������¡����t���¾�o�t�=�t�C�o�t���t¤H���t���f¡����N���z�������L­¬��(�����z�§«����L��ÒH�o�f¯4��¡����u=<G�D�?EDy������?�DEF�H�?�I´� ���������L����C�©¦��f£N�y� �N���o��¡������Ö�����©���t¡������ � ±o�����L�������N����� ���N���´¡��¡����^9 B>?\�t�������z�Z�¾¯4���L���(¡����»���t¢e�t���������´£f��������«����L�/�����»�o� �º����Z� ­�¬��\���t�C� �H�N«o¡������Ö���L�z¡�����������«Y�:���t�����Z�t°S�Z�����U��¢¯4�������U� �����z¡��t���(�����S¡����H�t�����o��¡������C�(�����©��� ¡��������t±o�����Z�³������N����¯4�x¡��Û���o�t������¤U�o� º����L�Û£M��������«����Z�t­ÖÃj�����U£f��������«���� ����C�Z� ��ºC�×�z��«=�:���L��¤Ù���t���t� �L���t������������¡��Z��¡���� ãtÝ+5�Ý�åtýBè_ÿfé���C� o=Þ�ÿ�&tÝ�åLý_è_ÿ�éH�z�L¦´���:¡����t�µ��¢I���o�t��� £f��������«����L�L­¿À�©Â/��ÃS�²·&½»°C¯��»��¡�������¨t�»¡����´����¡��\�³¡����C�h¡������Z�/���pdI����������ë�H¡�� ���t�����L���t�:¡"¡������z�L¦§���:¡��������L��� �����o¡�������¢µ�Q���N�z�¦´������¨L�L�\�:���t��¤�­' x £f��������«����Z�4�o� ºC���L�\���\�§���o�t���V����«Y�:���t��¤' �( U*),+ # x £f��������«��������C�Z� ��ºC�����t¦´���:¡����t�' �( �-."�/ x �������t�:¡¾���C�����������o���L�Q£M��������«����Z�' �0 e x ���t¢e�t�����Z�H�Z���������t���À�o� º����L�y£f��������«����L�1(243�5687:9�;09�<!= 1

6:789�> 7�?@> 7�A�=

1 ?�B!C,D�EF 5 1�G7:9�;09�< 6:789�=7:9 6:7:?@> 7:A�=

1 ?�B%HJILKNMF G C,D:O@HLP!E(B IRQ:SUT�M KRDA�M789V7:9�;09�<%W:X@Y�Y8Z 6%= 687:9�W%[]\!^_:`�=7:? 789�W!a 687�?�W:bc_8d�e�fRg%h!_8ij^�d�g�= 687:?(=7:A 789�W�e�kle�bm^ 6:7:A�fL7:Aonp= 6:7:A�=

dI���N�������(xty���������«���� Ð �t�=�t�����t���t���Z�4��� y(�>� � #nwq��r �Xµ���µ���������������:���L�z¡������u9E=<?�� ±o�����L���z�������^��¦�����¤�¡����4£)���L¯�z¡������h¡������N°V¯4��������¦´�M¤ «=�´�6MY�L�h¡��L��«)¤Q¡����´¦§�f£N�t¦§�t�:¡����¢�¡�����=<GED�?�DQ����� ?�D�F�H�?EIQ� �����C�z�Z�¾���N¡��\��� MY�t���t�:¡"���L�z¡z������¥���t£N�t���t­cd������ ±���¦§������°S��������¦§�����C�!�H��� ¡�������� _�� ���¡������������t��9E=<?U�t±)�����Z�������N�Ö¯4���L��� _� «=������� ¯4�x¡��,¡�����(�.���/���&�.���N« �³�L� ¡��µ���µ¡��´«=����� ¡��������Z� ­�Xµ�����4�����o���t�f¡��L�¾�#z�����o�o��� ¡H%µ��¢:�N« �³�L� ¡��V«=�������s¡�� _�� ¯4��¡��"¡����N��� �" V ��"�vs"% ��«b�³�L�h¡��t­ ¬��,�o¡�������¨L� ����¦§� ���Z�³¡������ ¡����������¼���=�t���f¡������t°�������\�N��#z�����o�o��� ¡H%�°t#����L�z¡H%�° #z�������L�z¡H%»¢e���N¦ ¡����s���L�z¡��L����t���f¡������C���j�t���:¡��t±)¡L°8¡�����������¡���¡��H¡�������MY�L�h¡´�t�������L� «)¤¦§�f£)����� _�� ���y¡��(¡����&�N�o¡��L�W9E=<?»�t±o�����Z���������´���H�����o�L��¡��¢e����¦ ¡���� _�� �À�z�=�L�t�xºC�"����«Y�N���t��¤�­t &�Ívu�É�$�� ÏLÈ � � È�¤ (ÈQÎ��¾Ì�¬ ®øÍ �/Ê%$

§úË»È�Ét(ÏtÈ�� �/È�ɨ� ���� ÏLÈ �

Â/�µ�»���L����������� ±o�����L�������N�y���������C�����N°N·"¸ ¹������)�t�L���z�����»���� ���N���t��¤"���L����¡��Z��¡��&¡����t�µ����¡���¦´��¡��4¡����t�N��¤yëS��]�° � �hî²­WX^���L� �

���o¡��N¦´�f¡����Ô«C�����L�����L�����������t±)�����Z�������N��¡³¤:�=�L���t���©«C�����o�¢e�L�����L�»¢e�N��¡����¾£f��������«����Z�����C�Z� ��º��L�§���§¡����&¦´�f¡������������C�f¡z�¡��t���y��¢I���H·&½&���t��¤§�����´¢e�N��¡����"���t¡������´¡³¤)�=�"�t��¦§�=�N���L���¢�¡����L���j£f��������«����L�L­¶· Ð ���t��ë`� � î������f£:���o�Z�¶� �N���j¢e����� ¡��������¢e�N�y�z�C���,�z¡���¡����\¡³¤)�C�¥�����L��Ò)��������¢s·&½&���t��¤ ¡������������,�¢e�����h¡����N�����C�����N������¦§¦§�����s����������������°:�����§¯��4¡��:�C��¦´��ÒN��C�z�"��¢8· Ð ���t�����Q�����µÂ/��Ã��Ô·&½Ù�z¤o�z¡��L¦\­^ ���y� ���:¡�������¦§�t�:¡´¦§�����������U�z¡����f¡��L��¤ �����o� �L�L���§���×������N�����L���z��£��µ¢_�N�z�������H�������o�L�H«)¤»¡����"£f��������«����s�o�t�=�t�C�o�t�o��t���Z�ª��¢¾¡����\���t¯ �:���L��¤ ��­8¿²¡»º����z¡�¦´�f¡������L���Z�����U£f���������«����(���N�´¡����H£M��������«����(��¢8¡����§�t�N�����L� �:���L��¤"���ª«C�����L��N�y¡����t���4����¢e�t�����Z�y¡³¤)�=�L�j¢e���N¦ ¡����"��� º�����¡������C�4�����H¡����L�������«o¡³¤)�������U���t���f¡����N���������C�t­ ¿²¡H¡����t�-�L�z¡���«������z���L�H�Ö� �����¡�������¦§�t�:¡(¦´����������� «=� ¡³¯��t�L�U�Q¦´�f¡������L��£f��������«����´�C�����«)¤\�����L��Ò)�����§¡����L���ªãtÝ�5�Ý�åLý_è_ÿ�é������ oCÞ�ÿ�&tÝ�åLýBè_ÿféQ�z�L¦§���:¡����t�L°����¦���������¡��§¡��������L����¡����������Y£:���t¯<�o¡������x¡³¤\� ���C�o�x¡����N���L­¬��/���f¯Û¯µ����Òª¡����������N�§¡����s�z��«�¡³¤:���������Ô�L��������� �Z�´� �����¡�������¦§�t�:¡j¦´�����������ª�����o� �Z���j���������ª¡����s� ±���¦������s�N���t�����Z����Ö�����Ï�¼��� dI��������� ��­ ���U���C�ç�,«=��¡��æ�N���t��¤¼¡��������¦§�&·s¸ ¹��o�o� ��¦§�L�N¡��t°=�!�2�����I�!�����)��¡��L�y��¡ _ «���« � ������(�.���/�/��� ���/���¥���)��¡��Z�×�f¡ _ ���t£)���L¯ � ­Ù¬��¥���������z¡����f¡��¥���dI����������]ª¡����"�������������&�����L���t�:¡���¡������H��¢¶¡����"¡³¯j�§�:���t�����L�n�£f��������«����§�o�t�=�t�����t���t���Z�(���s¯��t�������s¡����y�����z�o� ���f¡��L��� ��������x¡����N���4�����\��� ¡������\� ±o�����L�������N���t­Zo�����t�/£f��������«���� _�� �����!��ûB�o�t����¡��Z�y«)¤ _��%w � þ������ _�� ���

�´û _���x þI�����4�o� º����L�»¡����¾����¦§�µ��������� `ED��:û _ �!�s�&�@�/�&�G����þ�°¯����z�t¡¾��� �§� ���:¡�������¦§�t�:¡4¦§�����������§¢e����¦ _�� w �/¡�� _��x ­¬��©¡����t�Ù�����Z��Ò¼¯4��� ¡����t�����)¤-� �N���o��¡������%���C�Z� ��º��L� ���¡���� _�� �À���C�Z� ��ºC������«Y�:���t��¤U��¢Ã���Q� �������L���C�N�����»¡�� ����¦§��Z�:��������¤Ö���´¦§�����Q���Z�³¡������h¡��L�,� �N���o��¡������,�C�:�z�Z� ��� _��x ­¿À�¥¡������"�t�N�z�N°C¡����t�������s���Q� �N���o��¡�������¢e�N�"�t��¡����L� _�� w ���N�_��x ­/¬������f¯%�����L��Ò\¯4��� ¡����t�&¡����»�o�o� ��¦��L�:¡&���o�o�Z�¾¡��«=����� ¡��������L��«)¤s¡����W?�D�F>H�?�I/�t±o�����Z������������� ¢e�t���������µ¡�� _�� x�����������z�����Z�:�������L�¥�������)¤ ?�D�F>H�?�I\� ±o�����L���z������¯4��¡���¡�����o�L� �������L��� �»��¢ _�� w �M­�¿À��¡��������L������° ?��:û _ �y�������(�G�oþj���?����4�������»���L�:�������Z�y¡��´«C����� ¡��������L����� ���N­

$b

$aR2($b/title)

C1($a/last = “James”)

PE1($bib0/book)

$bib0

R1($b/@year)

PE2($b/author|$b/editor)

$reviews0

$b

PE3($reviews0/book)

R5($b/review)R4($b/title)

join

$b

$aR1($b/title)

C1($a/last = “James”)

PE1($bib0/book)

$bib0

PE2($b/author)

$reviews0

$b

$r

C2($r/rate>4) R3($r/reviewer)

PE3($reviews0/book)

PE4($b/review)R2($b/title)

join

V1

Q

root variable pattern variable

PEi, Ci, Ri: variable, condition and return expressions respectively specified in FWR

containment mapping join condition

R3($a)

dI���������/](x��j���:¡�������¦§�t�:¡j¸ �����������N�����t¡³¯j�L�t��y �/�����H½\

Xµ���(�³�����Û�t�����o��¡������Û��� � � �^?��:û _ �y�/�����&�G�)þ&���Û���������t��!�2�v���/�!�©���C� ?��=û _ �y�������&�G�oþ(���Ý�(�.���I����� ���I�!�¥¦´�f¡����¡���� �³�����Ñ� �N���o��¡������Ñ���C�Z� ��º��L�ú�N� ?�k=û _ �y�/�����&�G�)þ¥���C�?�� û _ �y�������(�G�oþI�������N­8¬��"���L��� �(�t���:¡����)���&¡����(�t���:¡������o�¦§�t�:¡H¦´����������� �����o�t�L���§¡��U�����Z��ÒÖ¯4���t¡����L�ë����� � _�� w ��o�tº����L� �N� `�D�kCû _ �y���Nþ ¦´�f¡������L� ��� � _�� x ��� º����Z� ���`�D�kCû _ �y�G���=�G�&�.��þ�­ Xµ��� ���L���������¥� ±o�����L���z�����%¡³¤)�=�Û���o�¢e�t�����L�Q¢e�N� _�� w �(��� @��G�A�.�&�.���4DA�����(�����!J�������¡����»¡³¤)�C�¢e��� _�� x �����G�A�G�&���Y­?�µ�����L� �N�U¡���� �z��«�¡³¤:��������¡����t����¤¢e���¥���L��������� �t±)�����Z�������N�-¡³¤)�=�L�L°�¯j�Uº����Å¡��C�f¡ _�� x � x_�� w �M° ¯4���L��� � x �o�t����¡��Z�»ã�ÜCátýBß�ooÝ�����­�¬���¡��)���"�L������� ¡���-�Û� �N�N¡�������¦§�t�:¡H¦´�����������U¢e����¦ _�� w �Q¡�� _�� x ���C���������¡��f¡��Q¡����t��������«o¡³¤)�������U���t���f¡����N�Û¯4��¡��<��¡³¤:�=�¥�t���o��z¡��������N¡º�=}��&�&�2�!������� û _�� w �tþ �&� ���A�G�&���Y­V¿²¢ _�� w �j�������������� ¡��������Z�(«)¤����N°f¯4�������»���I¡����4�L������°f¯��j���t¯4���x¡���¡����4£f�����x���«����(�o�tº�����¡������ ��¢ _�� x ¡��´��«�¡������Q¡����ª�o�o� ��¦§�L�N¡&���o�o�L�«=������� ¡��H��¡s¢e���N¦ ¡����N���ª��� ¡��������L�Q«)¤ _��yw � ���������\�z�C����ª¡³¤:�=��� �N���³¡��������:¡Z­^¬��s¢e���z¡����t�¾�t��¦§�������/¡������z¡������ ¡����Z�����¢S� �N���o��¡��������4��¡z¡��������Z�y¡�� _�� w �s����� _�� x ­Â �z��¦§�������Q� �N�:¡�������¦§�t�:¡Q¦´�����������Û�����o�t�L���H�����o� �L�L���¢e���������4¡����¥£M��������«����Z�´�o� º����L�,��� �(�.���/�/��� ���/����«)¤ ������C�¼¡����N��� ���ç��­¾Â¾��¡��������N� _� �x �C���Q���¼�t�������Z�z�=�����o������Ù£f��������«����¼¦´�f¡���� ��� ���N°���¡Û�t��� «C�Å�z�=�L�t�xºC�L� �N�?��=û _ �y���(�.���/���Cþj���t¡��������Z�¥«)¤C���N­jXµ���t��� ¢e�N���N°V�������t���o�¡�������¦��L�:¡ª¦´�����������¥�z¡����f¡��t�N¤��������f¯¾�s¢e���»����«o¡³¤)������� ��� ����f¡������C��«=� ¡³¯��t�L� � � �\£M��������«����Z�������Ý����� �\£f��������«����L�������� ¡������ � ±o�����L���z�����C�t°����,�����o��¡������Û¡��©���L�:�����������©�³¡������h¡��t�� �N���o��¡��������/���+�§¡���������� ���N­v�µ�N�z�Z���N�Q¡������z�C�t� �Z���z¢e���� �N�:¡�������¦§�t�:¡ª¦´�����������N��«=� ¡³¯��t�L� ���f¡�¡��L��� £f��������«����L����¢���������+�H«)¤Q� ±o��������¡������y¡����t�����z��«o¡³¤)�������H���t���f¡������C�t°=¯j��o�t¡��t��¦§�����°������� i ­, ��� ð ��� ï�ð�� � � ~ � � �ö¬�� ���f¯ � ±o��������¡Q¡����©�Z�³¡���«o����������L�×�t���:¡�������¦��L�:¡y¦§�����������N�´¡��U�N«o¡������×� ���t¯4���x¡��������¢�� r f-g�� û_�����f¯4� ��� d^����������ìNþ"¯4��¡��U���L���C�Z�h¡(¡��C����� �£)���L¯ �³¡������ ¡�������­¶¿À���!��� � £)���L¯��z¡������ ¡�������°L¡����j�����z�o� ���f¡����N����t���f¡������C�z�����׫=� ¡³¯��t�t� ���2�&�G�U���������A�G�(�.�Û���"�G�����(������H�����L���t��£��Z�,���C�z���o����¢»�Ö���t¯4��¤×� ���C�³¡������ ¡��Z�×�t���t¦§�t�:¡�>pA�G�=}Y­ ¬��\���t���t�Q���=�L� ��¢e¤Ö��£f��������«���� _ �\��� r f-g�� «)¤`�D û _ ���n?(�&�/�@�G�>pA�G�=}�þ�°µ¯4���L��� _ �!�n?(�&���Ö���t�����Z�z�L�N¡���¡����£f��������«����µ¢e���S¡����4���)��¡��t���t¦§�t�:¡���¢������ �����Z�z����¡��o�o� ��¦��L�:¡L­¬��j¡����t����� ¡������ª¡����������&�.�<CG���A�.�&�.�4�������8��¢=�L�N���ª��«b�³�L�h¡«=��������¡�� _ �´���s���Z�:�������L��«)¤h��­�d��N�"¢_�M£�������«����»���L£:���t¯µ��t���L°C¯��ª���C�Z� ��¢e¤Q�y£M��������«���� _ �/��� r f-g�� ¡��y«=�»«=�������Q¡����¡���¦´�f¡�������������°N¡����"���L��� �t�C�����:¡j�L���L¦§�t�:¡����(�.���I�����"��¢�G�>p=�=�=}���°t���L�z¡������ ¡^��¡��I����¡��j¯4�x¡��(¡����µ�t�������x¡����N�(��¢ �������C���� ¡�������¡����(�t�������Z�z�=���C�o�����»���t£)���t¯��t�Z­! "æÊ �~�tÏs� ÏLÈ �Ê%$ � �Ö��LÍ sÉ8ÏtË»È

^ ���§Â/��ÃS�²·&½ø�z¤o�z¡��L¦ö���§�o�L£��t�����=�L�U���z�����?���M£f�h� Ð$#��­ ��­�Xµ���/¢e���N�N¡��t���y��¢ ¡����&Â/��ÃS�²·&½-�z¤o�z¡��t¦ú�����������������% E #-�"#IQS#I�N!���# �"#%��$�#I���/�W���h����#���'7!"#%�"#I�"!"#%$ ��' ��!"��#I� �"��*�!"L�#

�����"��#I����QF����� T ��'7!6����'28J#%'7!L8/��������'�5q�"!"�6��!"#%5�+ !"�Ö1 =9;@D

1A'�9+)�#U4:.+)akml%9�#I�6C�,�#I'7!��)+9�<<JKW9W.C)�#a,n!�� !"L�#-.)�#a,�����!������a-Mjn/o�dpqHOopr4UPQ OQ\d�e[Qaf6QaO�RaXg 1('�9*)���4:.h)�#a,��"#TD���# � �4,���#4D���# �

E"F <<9J<�)���,��6��!"#Äu�=9�<<JKW9W.C)��4,��"#4D���# �K#I�6iMjn/o�dpqHOopr4UPQ OQ\d�e[Qaf6QaO�RaX &('*),+

dI���N�����(ìAxS K4�t¯4���x¡������´��¢S½%¯4�x¡�� K4�L���C�Z�h¡4¡��zy �

�N�§¡��N�´��¢¶Â¾�C�������¾¬��t« Z)�t��£��L�������hX^��¦´�L�f¡j���t��£)���t¡��t�o��N������­¶¬��¾������¡����"½&������¡����������t�S���C���:���L��¤ª�t���N�����4�M£f���������«����Ö�f¡Ûç:ýÔý o�- .�.Yå ç)Ý�ÿ�o�ã0/Óå è�ã0/�Ü�o�Ý éCé�/ÓÝ�äfÜ1.�2ªù�Ý�Ý+5xýH¡��Å���������¤)¨t�ª����� �L£M�����C�f¡��(¡����»�������o¡s·&½&���t��¤�­4¬��»�������y�o¡�������¨L�¡�����¡³¤)�=�����o¢e�L���L��� �©�����-�z��«�¡³¤:�������Û¦§�L������������¦´�H�������£)���o�L�"«)¤&¡����j· Ð ��� ���z¤o�z¡��L¦bëS� �hî:¢e���I� �N�N¡�������¦§�t�:¡¶¦´������������C­^ ���¾Â/��ÃS�²·&½%��¤)�z¡��L¦ø�����L��Òo��¡����»� �������L�h¡����L������¢S��������t¯4����¡z¡��L�b�:���L�����L�©«)¤b� ��¦§�����������Å¡����t���U���Z�z���x¡�� ¯4�x¡��¡����N��� �����o�o��� �Z�Å«:¤ �o�����Z�h¡���¤<�t£f�������f¡������,¡����Û�������N��������:���L��¤����N�������z¡¶¡����j���t¦§��¡��j���)�t��¦§�t�:¡��L­I¿²¡8�������&���t��£��Z�^�N���¡��L�z¡�«=�L�´¢e����¢e���z¡����t��� ���C�o���h¡������(�����²�o�L�o¡��H� ±o�C�L����¦§�t�o�¡����8�³¡����o���L�s���:£N�L�z¡����N��¡������´¡������:���L��¤\�C�L�z¢e�N��¦´����� �»�N������N�������t£��Z�y«)¤H�����z¯��t���������:���L�����L�µ�C�z�����´�t�N�����L��£)���L¯¾�t­¿À�<�Û�o���z¡�����«���¡��L�<�z¤o�z¡��L¦ ����Ò���¡�����¿À�N¡��t����� ¡Z°/¯����t±)��=�L� ¡8¡�����¡��&�:���L��¤:�²«������L�»�t�����������"��¤)�z¡��L¦Ù����ÒN��Â/��Ã��Ô·&½¯������������M£��ª¡�����¢e�t¡����������Q� �N�z¡��&¢e�N�"�:���L�����L�s���C�z¯��t����«����«)¤ �t�N�����L�©�N���L�L­�¿À�Ö�����ª� ±o�=�t����¦§�L�N¡��t°S¯��H�����z¡�������¡����# ¯��t�L�x¡��N���t��¤»�L���������/�N�§¡³¯j��{ g/¿À·-¦´�����������Z�j�N� ���N���S�¹¶�o�t���I¾���L� g¾� ¡³¯�����Ò©û_¹¶Â_gsþh­ ^ ���»�������y�����/¡�����Â/��Ã��·&½Ö��¤)�z¡��L¦<�����z¡��������L�(�������������f¯¾�Y���z�L���V¡��¾��������¡I�:���t�����L�L°¯4��������¡����´��¡����L�����N�z¡����Q��� ¡���¢µ·s¸ ¹,�o�o�t��¦§�t�:¡��(�N�"����t¦§��¡��"���f¡��´�z�L��£N�t�Z­

0

20

40

60

80

100

120

140

different XML document size

quer

y pr

oces

sing

tim

e (s

ec)

non−caching, containedxcache, contained non−caching, overlap xcache, overlap

small document(175KB)

medium document(890KB)

large document(1800KB)

dI���N����� f&x��j�N¦§����������������¢�½&���t��¤yÄ����o� �Z���������&Xµ��¦§�L�¿À� ������������¡������4�t±)�=�t����¦��L�:¡��L°�¯��H¡��L�z¡H�o� MY�t���t�:¡´·s¸�¹���)�t��¦§�t�:¡s����¨t�L�L­¾¬��ª¯µ����¦Á����Â/��ÃS�²·&½Ù¯4�x¡������t£��L������:���L�����L�^�����ª�o�L�����N�"¡��������L¯U�N���t�����Z� ¡��&«C���t��¡����t�V¡���¡�������¤�t���:¡��������L� ���Ö�N�»�f£N�t����������������¯4��¡�� ��������¢¾¡���� �t�������L�

������� Y�� � ^��Y%i�@Y8d�kme�kmY � � ^�� `�kle�k���� ���%_:b��(_:e�kmY��� k���^���� �"! # kmij^�� ijd$! # kmij^�� ijd$! # kmij^%� ijd$!

&�' (�)*' +-,/.102+43657)*. 8�39;: < <�= > < = ? 9;:1@ = A>�B8< <�= > < = C 9 < A�>�= >9 >8<%< <�= > < = ? C < ? < = C

&' (�DFE 3$($G�.$HIH-0J+�KL)*.1839;: < <�= > 9 = A 9�9 ? A�= ?>�B8< <�= : 9 = A 9 ? :�: >9 >8<%< <�= > 9 = : < C�A�A8<XI��«����h��xSÂ/��Ã��Ô·&½ Ä����o�t�L���z�����jXµ��¦§�»�j��¦§�=�����L�:¡���:���t�����L�L­¶¬��µ�t��¦§�������j¡����4�:���L��¤������o� �Z���������/¡���¦§�4�z�=�t�:¡¯4���L�\�C�z�����´Â/��ÃS�²·&½Å£��L�������µ¯4���L��«)¤:�²�������z�������x¡Z­Xµ�������L������¡H���kd^��������� fU����� ���C�z���³¡��t�:¡y¯4��¡��¼�N���y� ±)��=�L�h¡��f¡����N�V­ Xµ���´�:���t��¤��=�t��¢e����¦§����� �§��¦§�����f£N�t¦§�t�:¡s¢e���¡����¾¡���¡�������¤»�t���:¡��������Z�´�t�N�z�4���������ªÂ/��ÃS�²·&½¼����¡����&¦��:�³¡�����N���xº=�t���:¡Z°^«)¤����©�����o�t�(��¢4¦§��������¡��C�o�´���U�¥¹¶Â gÑ�t�o�£)��������¦��L�:¡L­ Xµ���´�C�L�z¢e�N��¦´���C� �´��¦������f£N�t¦§�t�:¡����»��«=���o¡��­ ff±H¡���¦��Z��¢e���µ¡������f£��L���������������§�t�N�z�Z�t­¬���«����Z��Ò¥�o�f¯4�¥¡����y�:���L��¤ �����o� �Z���������H¡���¦§�§�����L�����¡����(Â/��ÃS�²·&½%��¤)�z¡��L¦Ñ���N¡���¡������t�(�t��¦§�C�N���t�:¡��t°o�Ô­ �N­�°�¡����¡���¦§�j�C�z�Z�"¢e���8�:���t��¤��o�L�t��¦§�C�:�z��¡������¶°L¢e�N�I�:���L��¤��t���:¡������o�¦§�t�:¡8¦§�����������&���������t¯4���x¡������/�����(¢e���8�:���t��¤&�L£f�������f¡����N����L���C�Z�h¡���£��L��¤N­�Â/�ª��«����t��£��Z�����eXI��«����c�N°^¡������t��¦§���o¡��f�¡������C���S�f£��L�����L����¢e���(Â/��Ã��Ô·&½ �����t��¦§�������f¡���£N�t��¤��z¦´�����¯4��¡��Q���Z�z�=�L� ¡µ¡��§¡������f£��t�������Y� �N�z¡L­M §úË»ÈQÎÄ�LÍO(IÏtË»ÈO(

Xµ��� ·&½&���L��¤:�²«������L� Â/��ÃS�²·&½ �:���L��¤ �z¤o�z¡��L¦ ����¦´�§¡��¦§������¦§��¨t�»¡����»¢e� ¡����������\�t�N�z¡&��¢j·&½&���t��¤Q���Z�z����¡��&«)¤ ���o���¯j�L�����������t¯ø�N���t�����Z�»�C�z�����©�t�������L�Ö�:���L�����L�§¯4���t���L£��t��=�N���z��«�����­ ^ ���y�����t����¦§��������¤Ö�t±)�=�t����¦��L�:¡����/���Z�z����¡��H�����������¦§���z������­"Ns��£��t��¡����f¡�������#����L¦§��¡��2%"���f¡��(�z�N����� �L����������Û�¥¹¶Â_gÁ����� ���L��� ����� ¡³¯�����Ò©�o�t���M¤����»¦�������¦´���Ô°�¯j�� ±o�=�L� ¡&¡����f¡"¡����»�=�t��¢e����¦´�����t�»��¦§�����f£N�t¦§�t�:¡/¯��������¥«C��t£N�t��¦§�������o����¦´�f¡����8���(¬��t«(��¤o�³¡��t¦´�V�f£N�t�Y¡�����¿À�:¡��L�����t¡L­

O � P/�/Ê �/ÈQÎ�� (ëS� î Ð ­/������£M�����L����° Ð ­QNs�����t��¦§��°/�����<¸ ­µ¹¶�t��¨L�t�������B­

K4�L¯4����¡������©��¢ K4�L���������yÃ�±)�����Z�������N���§����� K4�L���������ÄS��¡��H½&���t�����L�L­C¿À�SR$�UT�VXWYR�çNè�5�âNä�ÝIo�çNè_âZW"R\[�°��������L��Zí \Z]b� � \�°a�Zí�íNí�­

ë �Mî Ð ­:������£f�����Z�z�N°�N�­ Ð ­^Ns�����t��¦§��°N¸ ­�¹V�L��¨t�L�������Ô°N���C�¸©­*_»­Ly������o�B­\¾����¯��t������� K4�L���������»ÄS�f¡��Ö½&���L�����L�{&�z�����*y&���L¯¾�t­�¿À�a`Zb*TdceWfV âfégT�è_ݲê:ÿIW7b*["°8�������L�� f�íI] ��í f�°G� ����� ­

ë �fî(ª­ # ­Z�j�������o���µ�����"Ä8­Z¸ ­L¸Q�t�������V­ ^ ��¡���¦§����¿À¦§����� �¦§�t�:¡��f¡������C����¢^�j�N���³����� ¡���£��"½&���t�����Z�S���NK4�t���f¡����N�����Ð �f¡����µ�N�z�Z�t­8¿À�hV"i �%bI°������N�L�/ì�ì�])í � °<�Lí:ìNì)­

ë \�î�¹�­4�j���t�-�����×þ­µÂ(­tK4�������t���z¡��L�����L�L­æ d�����¦§�t�¯�����Òy¢e�N�&Â/���z¯��t�������y·s¸�¹Û½&���t�����L�~{&�z������y&���L¯¾�t°¿À�´Ä����N�����L���t­lX¶�Z���������t����K¾�t�=����¡q� ��� �M� ]�°Z¬��N���t�L�z¡��L�Ä8�N��¤:¡��L���������&¿À���³¡��x¡��o¡��N°<� ��� ��­ë �fî�¹�­s�j���L�V°�þ­/ª­ K4�������t���z¡��L�����L�L°������ ZY­4¬�������­·&���N�����s�j Z)�L¦§���:¡����ª�������������hZ)¤o�³¡��t¦Ñ¢e�N�¾·s¸�¹½&���t�����Z�t­ ¿À�jVk`Zlnm �UTäNÝ ü´ÿfé�ã ý_Þ�â�ý_è_ÿ�é ooâ�ooÝtÞW

m¥âNäfè�ãtÿféXWpo\è�ã å�ÿ�é�ãhèÓé�°­������� � ��� �o­ë ]�î��s­¾�j�����o���f£o��Ò)���/�����Ý{ª­µ¸©­m�����������>MI­ Z)�L¦§���:¡�������N���������¥��¢&¬��t«,½&���t�����Z�t­qiVç)ÝsrYt*Tduwv�ÿfÜoÞhéYâ65 °íCûI�fþ9x���]<�fì)°b� ����� ­ë ìMî-Z ­ Ð ���L°�¸©­���­bd�������Ò)�����V°����C���s­&���������z�N�V­�Zo�t¦´���o�¡���� Ð �f¡��¥�������������Q�����xK4�L������� �L¦§�t�:¡L­´¿À�xrYtyTfu�W

u&ÿfüyá�âfßIW"`hé=ä�è_âf°C�����N�L�N� � � ]l� \a�N°a�LíNí ]�­ë f�î Ð ­ld^�������L��� �¶°:ª­:¹¶�t£)¤�°:����� Ð ­ Z)��� ���V­^½&���L��¤y�j�����¡�������¦§�t�:¡"¢e���»�j�N���³����� ¡���£��H½&���t�����L�s¯4�x¡��]K¾�t���������Ã�±o�����Z�����������L­ ¿À�zR �{T�VXW|VYÝ�â�ýBý 5�Ý;Wdo�["°)�����N�L�_���NíI]

��\ f�°<�Zí�í f�­ë í�î���­XNs�������z¡��L���Q�����\Ä8­o¹¶���������V­ ^ �o¡���¦§��¨L�����y½&���t�����Z�

{&�z�����´¸ �f¡��t����������¨t�Z��y/���t¯¾�sxSÂ<Ä����N�h¡����t���B° Zo�L������«����Zo�����o¡������V­s¿À�}Vk`Zlnm �{T�WUV âfé=ýÔâ7u&âfÞLá�âfÞ�âZW~b�â65 è � ÿ�ÞhàéCè_âf°��C�����Z�J� �a��]l� \G��°G� ��� �N­

ëS� � î"®ª­V®¾�:�z�f¤:�H���C�+�s­¶�&­VÄ����t��� �N­(· Ð ��� �.x� Xj¤)�C�Z�·"¸ ¹ Ä����o� �Z���������ø¹^�����N�����N�øûBÄ����t����¦§��������¤ K4� ��=����¡hþh­�¿À��o ÝZá�Tfu�W\T»â65�5�âMãW iCÝ�m:âfãh°��������L� � � ��]�� ��]�°¸��M¤ � ����� ­ëS� �tî"®ª­&®/�N���f¤N�,����� ��­~yS�N�����������%����� �s­s�&­sÄ����t��� �N­

K¾�t����������Ã�±)�����Z�������N� X�¤)�=�L�»¢e����·s¸�¹�°�¸Q���:¡����Z���Ô°������������­8¿À��`Zb*�FR�°��������L��� ��]b� �o° � ����� ­ëS���fî"Ä8­ZÂ(­Z¹¶���������"������®ª­I�I­�_�������­N�j�N¦§���o¡������s½&���t�����Z�¢e����¦ Ð �t����£N�L�\K4�L����¡������V­æ¿À��rYtyTfu�W�VYýÔÿZå;�Lçoÿ65 üpW

V=ù�Ý�ä�Ýté�°��������Z�N�.��íZ] � ]Ní�°)¾���C­K�Lí f.�o­ëS����î"ª­:¹¶�t£)¤�°:Â(­)¸Q�t�����t��¨t���¶°�_�­GZo������£=°:����� Ð ­GZo����£f���z�¡��M£f��­)¾����¯j�L��������½&���t�����Z�L{&�z�����Ãy/���t¯¾�L­)¿À��R$�UT�VXW

V âféhvoÿfãtÝ�W b*["°������N�L�µí.�I]<� � \�°A�������h�LíNí.�o­ëS�,\�î(½»­§¹V����°*��­jd�­hg/�������:¡��N�V°�K(­ # �����z���C��¦ª���z¡��:¤N°Ä8­j������°µ������_»­j¹V�Ô­¼Â/�h¡���£N�¥½&���t��¤ ��������������¢e�N�Ð �f¡���«��N�z�"¬��t«cZ)�L��£N�t���t­I¿À��o ÝMá$Tdu�W"Tªâ5�5�âMãW%i���°�C�����Z�/��íZ]l� \C°l� ����� ­ëS�2�fîs¿h­¾¸ ���������L��� �¶° Ð ­JdI���N���Z���t�V°/����� Ð ­ # �:����¦´�����¶­Â/����¯j�L�������/·"¸ ¹ ½&���t�����Z�¶�N�»®/� ¡��L���N���L���t�N��� Ð �f¡��

Zo������� �Z�t­^¿À��rYtyTfu�Wy�¾ÿ�ü§âZWy` ý²â65 ßf°o�������Z� � \a��]b��� � °Zo�t�o¡Z­F� ��� ��­

ëS��]�î�X"­=¸Q������° Ð ­�Z)��� ���V°=�����"yª­&y/�����)�V­MXj¤)�C�Z�����L��Ò)�����¢e�N�/·"¸ ¹eX^�������z¢e����¦��L���L­�¿À��R$�UT%V=°��������Z� � �;] � ��°¸��M¤ � ����� ­ëS�MìMîf_�­�ÄS������ÒN�����z¡����:¡��������,�����Ýyª­my������������:�t­æ½&���L��¤

K¾�t¯4���x¡������Û¢e���cZ)�L¦§���z¡������h¡������L� y/���t¯¾�L­ ¿À��Vk`Zlµàm �UT�W|RjçNè�5�âNä�Ý+5 o�ç:è_âIW7�yVk["°o�C�����Z� \G���I]G\G] ]�°F�LíNí�í�­

ëS��f�îs¬k�)�&­ ·&½&���t��¤ �N­ � x Â/� ·s¸�¹ ½&���t��¤ ¹¶������N�����N��­ �:¡�¡���x ��� ¯4¯4¯�­ ¯/��­ �N��� � X)K � ±o�:���L��¤ � ° Ð � ��t�t¦»«C�L� � ��� ��­ëS�Lí�îs¬k�)�&­ ·&½&���t��¤ �N­ � d��N��¦´��� Z)�t¦´���N¡����L�t­�:¡�¡���x ��� ¯4¯4¯�­ ¯/��­ �N��� � X)K � �N���t��¤N�À���t¦´���:¡����t� � °

������� � ��� ��­]

Onspace management in a dynamic edgedatacache

Khalil Amiri, Renu Tewari, Sanghyun Park, and SriramPadmanabhan

IBM ThomasJ.Watson Research Center�amirik,tewari r,sanghyun,srp � @us.ibm.com

Abstract

Emergingwebapplicationsareincreasingly serving dynamiccontent generatedby querying back-enddatabaseservers.The servingof dynamic content can be scaled by offloading applications and caching data at edgeservers. In recentwork, weproposed a persistent and self-managing edge-of-network datacachethat is dynamically populatedbased on theapplication query stream and stored locally in a persistentdatabase. In this paper, we discussthe challengesof cachemaintenancein such a dynamic environment,focusingon replacement issuesin the presenceof update-basedconsistencyprotocols. We describe how our experiencewith a real prototype,built using a JDBC driver and DB2, highlights thelimitationsof traditional approaches. Furthermore, we proposea cache replacement mechanismandpolicy that addressthesechallenges.

1 Introduction

Dataaccessedvia the Webis increasingly dynamic,generatedon-the-fly in responseto a userrequest or customerprofile.Examplesof suchdynamic dataincludepersonalizedwebpages,targetedadvertisementsor online e-commerceinteractions.Dynamic datais servedusinga 3-tiered architecture consisting of a webserver, anapplication server anda database;datais stored in the databaseand is accessedon-demand by the application server components and formattedand delivered totheclient by the webserver. To improve scalability and performance,caching at edgeservershasbeenwidely deployedonthewebfor staticHTML pages.For dynamiccontent, which requiresdatabaseaccesses, cachesare typicall y by-passedbymarking the contentuncacheable. Recent work hastargetedextending the static caching concept by storing the resultof adynamic web requestasHTML fragmentsor otherformatsindexedby theexact URL string or HTTPrequestheader[11,9].Consistency andcache spacemanagement issues,however, caneasily limit the scalability of theseschemes.

In morerecent architectures, theedgeserver (whichcollectively refers to client-sideproxies, server-sidereverseproxiesat the edgeof the enterprise,or cacheswithin a content distribution network(CDN) [1]) actsasan applicationserver proxyby offloading application components (e.g., JSPs, servlets, EJBeans) to the edge [7]. Databaseaccessesby theseedgeapplication components,however, are still performed acrossthe wide areanetwork. To accelerateedge applications byeliminating wide-areanetwork transfers, we have recently proposedand implementedDBProxy, a databasecache thatdynamically and adaptively storesdataat theedge [2]. Since theedgeserver is limited in resourcesof spaceandprocessingpower, we rely on efficient cachereplacement policies andmechanisms to keeponly the most beneficial datain the localdatabasecache. Cachereplacement for physical page-based caches,staticfiles, and read-only semantic cacheshasbeenthoroughly studied in the literature. Yet, it introducesnew challengesin the context of a persistent edge cache containinga largenumber of changing and overlapping “materialized views” of previous query results. This is because locally storeddata canbesharedby multiple cached“views”, and can be updatedby a cacheconsistency protocol.

In this paper, we review the design of the dynamic edge datacache in Section2, discussthe limitations of traditionalapproachesto cache replacementand offer alternative solutions in Section 3, briefly review relatedwork in Section4, andsummarize thepaper in Section 5.

2 Background

Weassumein this discussion thatapplicationcomponents(e.g., servlets) arerunningontheedgeserver(e.g., using theIBMWebSphereEdge Server [7]). The edgeserver receivesHTTP client requestsand processesthemlocally; passingrequestsfor dynamiccontentto application componentswhich in turn accessthedatabasethrough a JDBC driver. The JDBC driver

Query Parser

Cache Repository

to back−end

Query Evaluator

JDBC interface

CacheIndex

Edge application

Servlet

DBProxy JDBC driver SQL (JDBC)

CatalogResourceManager

ConsistencyManager

ModuleQuery Matching

Figure 1: DBProxy key components. Thequery evaluatorin-tercepts queriesand parsesthem. The cacheindex is invokedto identify previously cachedqueriesthatoperatedon the sametable(s)and column(s). A query matching module establisheswhetherthenew query’sresultsarecontainedin theunion of thedata retrieved by previouslycachedqueries.A localdatabaseisused to store thecacheddata.

20NULL620 20NULL620

���������������������������������������������������������������������������� ���� ���� � � �������������������������

� ���������������������������������������������������������������������������������� ���� ������ !�!!�!"�""�"##$$%�%%�%&�&&�&''(

( )))***Cached item table:

5 14 8

22

18NULL

16 13

120

340

450

1

770

880

35 30

45 40

Inserted by consistency protocol

Retrieved by Q2

Retrieved by Q

Cached item table:

5 14 8

22

18NULL

16 13

120

340

450

Retrieved by Q1

770

880

35 30

45 40

Inserted by consistency protocol

Retrieved by Q2

15SELECT cost, msrp FROM itemWHERE cost BETWEEN 14 AND 16

WHERE msrp BETWEEN 13 AND 20SELECT msrp FROM item

msrpcostid

Figure2: Localstorage: Thelocal itemtableafterthequeriesQ1 andQ2 areinsertedin the cache. The first threerows arefetchedby Q1 andthemiddle threearefetchedby Q2. SinceQ2 did not fetch the cost column, NULL valuesareinserted.Thebottomtwo rows werenot addedto thetableasa partofqueryresultinsertion,but by the updatepropagationprotocolwhich reflects UDIs performed on theorigin table.

managesremote connections from the edge server to the back-end databaseserver, andsimplifies application dataaccessby buffering result sets,and allowing scrolling andupdatesto be performedon them.

2.1 DBProxy overview

We designed and implemented an edge datacache, calledDBProxy, asa JDBC driver which transparently intercepts theSQL call s issuedby application componentsexecutedon the edge and determinesif they canbe satisfied from the localcache(shown in Figure1). DBProxy is designedto bedeployedon edgeserverswhichcould numberin thetensto hundreds.Consequently, it needsto be self-managing to limit the administrative overheadsof a largescaledeployment.Furthermore,eachedgeservercould serveadifferent population (i.e.,mayobserveadifferent accesspattern) and could containdifferentresource constraints, making manual optimizations impractical. To make DBProxy asself-managing aspossible, whileleveraging the performancecapabilitiesof mature database management systems,we chose to designDBProxy to be: (i)persistent, sothat resultsarecachedacrossinstantiationsand crashesof theedgeserver; (ii) DBMS-based, utilizing astand-alonedatabasefor storageto allow for the efficient execution of complex local queries; (iii) space-efficient, storing queryresults in common tables to avoid redundancy whenever possible; (iv) dynamicallypopulated, populating thecache basedontheapplicationquerystreamwithout theneedfor pre-definedadministrator views; and (v) dynamically pruned, adjustingtheset of cachedqueriesbased on availablespaceandrelativebenefitsof cachedqueries.

2.2 Common store

Data in a DBProxy edge cache is stored persistentlyin a local stand-alone database. The contents of the edge cachearedescribed by a cache index containing the list of queries. To achieve spaceefficiency, datais stored in shared tableswhenever possible suchthatmultiple query results sharethe same physical storage. Queries over the same base table arestored in a single,usuallypartially populated, cachedcopy of thebasetableat theorigin server. Join querieswith thesamejoin condition and over thesamebasetablelist arealsostoredin thesamelocal table. This schemenot only achievesspaceefficiency but alsosimplifies the taskof consistency maintenance,asdiscussedbelow. When a query is worth caching, alocal resulttable is created(if one doesnot already exist) with asmany columns asselectedby thequery. The column typeand metadatainformation areretrievedfrom theback-endserverand cachedin a localcatalog cache. For example, Figure2showsa local tablecachedat theedge.The local itemtable is createdjustbefore inserting thethreerowsretrievedby queryQ1 with theprimary key column(id) and thetwo columnsrequested by thequery (cost and msr p). All queriesarerewrittento retrieve theprimary key so that identical rows in thecachedtableareidentified. Later, to insertthe threerows retrievedby Q2, the table is altered if necessaryto add any new columns not already created. Next, new rows fetched by Q2 areinserted(id + 450 , 620) and existing rows (id + 340) are updated. NotealsothatsinceQ2 did not selectthe cost column,a NULL value is insertedfor thatcolumn. The query matching module ensuresthat the queriesexecutedagainst the cachedo not return any of the “f ake” NULL valuesin local tables.To handle a large andvarying setof cachedviews, thequery

-.-.-.-.-.--.-.-.-.-.-/./././././/./././././0.0.0.0.0.00.0.0.0.0.01.1.1.1.11.1.1.1.1

cost msrp availid

2.2.2.2.2.22.2.2.2.2.22.2.2.2.2.22.2.2.2.2.23.3.3.3.33.3.3.3.33.3.3.3.33.3.3.3.3 4.4.4.4.44.4.4.4.44.4.4.4.44.4.4.4.45.5.5.5.55.5.5.5.55.5.5.5.55.5.5.5.5 6.6.6.6.66.6.6.6.66.6.6.6.66.6.6.6.67.7.7.7.77.7.7.7.77.7.7.7.77.7.7.7.78.8.8.8.8.88.8.8.8.8.88.8.8.8.8.88.8.8.8.8.89.9.9.9.99.9.9.9.99.9.9.9.99.9.9.9.9

:.:.:.:.:.::.:.:.:.:.::.:.:.:.:.:;.;.;.;.;;.;.;.;.;;.;.;.;.; <.<.<.<.<<.<.<.<.<=.=.=.=.==.=.=.=.=>.>.>.>.>>.>.>.>.>>.>.>.>.>?.?.?.?.??.?.?.?.??.?.?.?.?@.@.@.@.@.@@.@.@.@.@[email protected]

D.D.D.D.D.DD.D.D.D.D.DE.E.E.E.EE.E.E.E.EF.F.F.F.F.FF.F.F.F.F.FG.G.G.G.GG.G.G.G.GH.H.H.H.H.HH.H.H.H.H.HI.I.I.I.II.I.I.I.IJ.J.J.J.J.JJ.J.J.J.J.JK.K.K.K.KK.K.K.K.KL.L.L.L.LL.L.L.L.LM.M.M.M.MM.M.M.M.MN.N.N.N.N.NN.N.N.N.N.NN.N.N.N.N.NO.O.O.O.OO.O.O.O.OO.O.O.O.O P.P.P.P.PP.P.P.P.PQ.Q.Q.Q.QQ.Q.Q.Q.Q

R.R.R.R.R.RR.R.R.R.R.RS.S.S.S.S.SS.S.S.S.S.ST.T.T.TT.T.T.TU.U.U.UU.U.U.U V.V.V.V.VV.V.V.V.VW.W.W.WW.W.W.WX.X.X.X.XX.X.X.X.XY.Y.Y.Y.YY.Y.Y.Y.Y Z.Z.Z.ZZ.Z.Z.Z[.[.[.[[.[.[.[ \.\.\.\.\\.\.\.\.\].].].]].].].]^.^.^.^.^^.^.^.^.^_._._._.__._._._._`.`.`.`.``.`.`.`.`a.a.a.a.aa.a.a.a.a b.b.b.bb.b.b.bc.c.c.cc.c.c.c d.d.d.d.dd.d.d.d.de.e.e.ee.e.e.ef.f.f.ff.f.f.fg.g.g.gg.g.g.g h.h.h.hh.h.h.hi.i.i.ii.i.i.i j.j.j.j.jj.j.j.j.jk.k.k.k.kk.k.k.k.kl.l.l.ll.l.l.lm.m.m.mm.m.m.m n.n.n.nn.n.n.no.o.o.oo.o.o.o p.p.p.p.pp.p.p.p.pq.q.q.q.qq.q.q.q.q

r.r.r.rr.r.r.rr.r.r.rr.r.r.rs.s.s.ss.s.s.ss.s.s.s t.t.t.t.tt.t.t.t.tt.t.t.t.tt.t.t.t.tu.u.u.u.uu.u.u.u.uu.u.u.u.uv.v.v.vv.v.v.vv.v.v.vv.v.v.vw.w.w.ww.w.w.ww.w.w.w

cost msrpid

{

{

row updated/inserted

propagate to edge

managerConsistency

query result (Q )

query result (Q )

after Q and Q executedA B

B

A

Origin serverorigin.itemedge.item

Edge cache

insert into table

Figure3: Updatepropagationin DBProxy. Thefigureshowsa table in theorigin databaseserver (item), and itspartially populatedcopyin the edgecache.Query results over the tableareinsertedin the cache. Whena row is inserted,updated, or deleted in the origin, thechangeis reflectedin the cachewithout evaluating whethertherow matchesany of thecachedpredicates.

matching engine of DBProxy mustbehighly optimizedto ensure a fast responsetime for hits. Cachedqueriesin DBProxyareorganizedaccording to amulti-level index of schemas,tablesandclauses for this purpose.

2.3 Update propagation

Updatetransactions in DBProxy arerouted to the back-end databasewithout applying themfirst to the edge cache,andaretherefore guaranteedtransactional semantics. Read-only queries in DBProxy aresatisfied from the cache if the datais locally available. Dataconsistency is ensuredby subscribing to a streamof updatespropagatedby theback-end server.Specifically, thecacheguaranteesδ-consistency, thatis, theview exported by thecachetoaquerycorrespondsto aconsistentpast databasestate that is within δ time units. Other important propertieslike view monotonicity and immediate updatevisibility arealsoguaranteed and theprotocols that ensurethemare described in [2]. It suffices to noteherethat the edgedatacachecanbeupdatedalong two pathsasshown in Figure3: thefirst is throughtheinsertion of anew queryresultupona query miss,andthe second is through the streamof refresh messagespropagatedby the origin server. Refreshmessagescontainupdates,deletesandinserts which are applied to the edge server’s partially populatedcachedcopiesof the origintables, without first checking if the tuplespropagatedto the cachesatisfythepredicate of any cachedquery. We found thatsucha checkcaninduce significant overheaddue to thepotentially large numberof cachedqueriesand the complexity ofquery predicates.This is illustratedin Figure 2, where the two bottom rows in thetableare insertedafterbeingpropagatedfrom theorigin becausethey were writtenby anupdatetransaction. No checkis made asto whether these rows match thepredicatesof queriesQ1 or Q2. Excessrowsinsertedin the table must laterbe“garbagecollected”by thecachereplacementprocess.

3 Cache replacement

To limit space overheadand optimize the usageof usuallylimited edgeresources,the cachespacehasto bemanagedsuchthat unused datagetsevicted safely while preserving dataconsistency. Specifically, the goal of cachereplacement is tomaximize thebenefit of the cache for a limited amount of availablespace.The cachereplacement component of DBProxyconsists of a replacement policy, that determineswhat to replace,anda replacement mechanism, that determines how toremove thedata.

3.1 Replacement policy

Thefunctionof thecachereplacementpoli cy is to determinethesetof queriesto replacefromthecache.Cachereplacementpolicieshavebeenextensively studiedin dif ferentareas,from virtual memory and file buffer caching to, morerecently, webcaching. The policy we useis a combination of previous approachesand is most suited for edge datacaching wherethequery processing costs andsizesaredifferent. Givena spaceconstraint, our policy triesto maximize the benefit of storingthequery resultslocally for the cost of the spaceused,similar to thetraditional knapsack problem of optimizing the cost-benefit. Determining the benefit of a query depends on multiple factors,namely: i) recency of access (that is the factor

usedin theLRU policy); ii) frequency of access (that is usedin theLFU policy and is useful for skewedaccess patterns);iii) miss cost versus hit cost (especiallysincethe query execution costs arehigh and variable); and iv) the frequency ofupdates(sinceanupdateaddsto theoverheadof caching). Weuseanestimatedaccess frequency measure thatbalancestherecency of access(using the lastaccesstime) and the frequency of access.Combining theabove setof factors, the benefitof maintaining anobject in thecache is proportional to theestimatedaccess frequency and the differential missprocessingcost, and is inverselyproportional to the updatefrequency. The updatefrequency term is maintainedat thelevel of tables.The benefitof query is thenoffsetby its space overhead.

Borrowing fromthegreedy heuristic usedin theknapsacksolution for replacing querieswith variabledata-setsizes, weorderthequeriesbasedon theratioof benef it

space used. Theobjectswith thesmallestratioarethenmarkedfor removal. While theparametersin thebenefitcomputationcanbe estimatedby maintaining various statistics, thespaceoverheadof a query, aswediscussin Section3.4, is morechallenging to computeaccurately becausequeriescanhavemultiple overlapping tuples,i.e., thesamerow can“belong” to many cachedqueries.

3.2 Replacement mechanisms: Challenges

The replacement mechanism is the processby which the tuples belonging to queries marked for eviction are removedfrom the local database tables. This process is complicatedby several factors including: the shared storagestrategy, thecontainment checking overheadandthe consistency policy used.

3.2.1 Shared store

Recallthat thetuplesbrought in by different queriesarestored in common tablesasfar aspossible. In contrastto traditionalreplacementof filesandmemory pages, theunderlying tuplescanbesharedacrossmultiple queries in thecache. ConsiderthequeriesQ1 and Q2 of Figure 2, wheretheresult setof query Q1 containsrows with id x 5,120,340 y andthe resultset ofqueryQ2 contains rowswith id x 340, 450, 620 y . If query Q2 is to be replaced while Q1 remains in thecache,thenonly therowswith id x 450,620 y canberemoved. Thespacegainedwill be thatof thetwo rowsdeletedand not thesizeof theentirequery which was3 rows. In generalthe replacement mechanismshould support thefollowing property.

Property 1 Whenevicting a victimqueryfromthecache, theunderlyingtuplesthat belongto thequerycanbedeletedonlyif no other querythat remainsin thecacheaccessesthe sametuples.

Read-only lazy replacement: If we assumethat theback-end database is read-only (i.e., there areno updates),a simple“counter” based mechanism that counts the number of referencesto a given tuple, canbe usedto evict queriesthat sharecommon tuples in the local table. When a new tuple is insertedinto the local table, a reference counter for that tupleis incremented. When a query is marked for deletion, the reference counter for a corresponding tuple is decremented.Eventually, a tuple is “lazily” deletedwhenits referencecounterbecomeszero, i.e., thereare no queries in the cache thataccessthat tuple. Theassumption of aread-only databaseis obviouslyunrealistic in practice.Next, werelaxthis assumptionand discusstheimplicationsof updatesfor the replacement policy andmechanism.

3.2.2 Containment checking

Whena new query is received by the cache, the query matching module (shown in Figure 1) verifies whetherthe newquery’s predicateis more restrictive than(i.e., is contained in) thatof a cached query’s predicate. Furthermore, thequerymatching module ensures that thecached query hasretrieved all the columns required to evaluatethe new query over thelocal cache.Thequery matching modulecancheckif anew predicateis containedin theunion of several cachedpredicates.In caseof a miss,the query is cached and its predicateand other clauses are added to the cacheindex. A new query canpotentially overlapwith alargenumber of cachedqueries,but no attempt is madeto computeor maintain information aboutthis overlap. One approachto eliminatetheproblem of common tuplesbetweenqueriesis to partition the tuplesinto non-overlapping setsandindex each setseparately in the cache[3]. This approach, though theoretically possible, adds to theoverheadof thecontainment checker, and wefoundthat it wasnot practicalwhenthenumberof querieswaslarge. Splittingthequeries increasesthe terms in thepredicateclauses wherefor every pair of queriesQ1 and Q2, their “ intersection” setQ1 z Q2 is indexedasQ1 AND Q2. Sinceeachquery canintersect with multiple queries,thecomplexity of theclausein thenumber of terms,grows linearly with the number overlapping queries. The number of non-overlapping sets,on the otherhand, grows exponentially with the number of queries. The containment checker is optimized to quickly find a matchingquery by usingan index hierarchy startingwith the tablenamesand the column namesusedby thedifferent clauses.This

quickly narrows the number of queriesto checkfor a full containment. For overlap checking, on theother hand, thesetofpossible queriesis muchlarger. Apart from the performanceissue,anotherproblem with assuming non-overlapping querysetsarisesbecauseof consistency maintenancewhentheback-end databaseis not read-only. Whenever tuplesin thecacheareupdated, their membership in thenon-overlapping sets maychange.

3.2.3 Consistency management

The consistency manager, asdescribed earlier, propagatesall changes(UDIs) to theback-endtablesthat have beencachedat theedge server. The updateof a tuple in anedgecache may make it matcha larger (or smaller) number of cachedquerypredicates. It is possiblethat anupdatedtuple ceasesto beuseful, becauseit no longermatchesany cachedpredicate.Thereplacement policy selectsqueries for removal, and the replacement mechanism must remove the underlying tuples thatbelong to these queries’ results but do not belong to the results of any queries that areto remain in the cache. Moreover,thereplacementmechanism should also evict any rows inserted by the consistency manager thatdo not matchany query’spredicate. For example, the rows in Figure2 with id { 770,880 | , thatwere insertedby theconsistency manager but whichdid not belong to either query’s result set,should be evicted. In general, thefollowing property should hold.

Property 2 The tuples insertedby theconsistency manager that do not belong to any of theresults of thecachedqueriesareeventually garbage collected.

Thefollowing observationaffectsthechoice of the replacement mechanism and the containment checking strategy:

Observation 1 The set of tuples forming the result of a cached a query can change dynamically due to possibleUDIoperationsat theback-end that are propagatedto thelocal database.

This observation says that the membershipof cachedtuples in query resultscanchange dynamically, which impliesthat maintaining explicit informationabout this membershipor its overlap is not desirable. In particular, the lazy removalapproachdiscussedabove, which usesareferencecounterwill not havethecorrectcountervalue if thetuplesareadded andupdatedby theconsistency manager. Creatingnon-overlapping setsbecomesnon-trivial when theconsistency managercanadd or updatetuples. OnaUDI, the tuplebelonging to aquerycanchange;sincedetermining thereversemapping, whethera tuple belongsto aquery’s result set,requires there-evaluation of thequery predicateover the new tuple, we cannot easilymaintainaccurateinformation about the membershipof a tuple in oneof thenon-overlapping sets.

3.3 Proposed mechanism: Group replacement

We proposea replacement mechanism that proceeds asa background processconcurrently with query hit and misspro-cessing. The ideais to perform pro-active cleaning such that thecache space usage never exceeds a maximum threshold.Thusreplacement, in our architecture, is not triggeredwhen thereis no spaceto insertthe missresults. While a thoroughdiscussion of alternativemechanismsfor replacement is thesubjectof ongoing work, webriefly describehereoneparticularand promisingmechanism,called group replacement, which is simple to implement andadds no overhead on hit, miss, orupdatepropagation. Thebasic ideaof groupreplacement is to execute thequeries thatareto remain in thecacheagainstallthelocally cachedrows, “marking” any rowsthatmatchany of thequeries’ predicates. A control column, usedas“marked”flag, is createdin eachcached table. This flag is first resetat the beginning of the group replacement cycle, and is setwhenever therow is accessedby thecachedquery. Once all cachedqueriesareexecuted, any unmarkedrowscanbesafelydeleted.Group replacement is usedin conjunction with a replacementpolicy to determine thesetof “vic tim” queriesto bedeleted.Notethatthe overheadof group replacement is linearin the numberof queriesthat remainin the cache.

3.4 Determining query size

Thereplacement mechanismsrely onthereplacementpolicy to determinethequeryor setof queriesto replace. As describedearlier, onefactor usedby thereplacementpolicy, whenhandling varying sizeddata-sets,is thesizeof the query. However,due to overlapping tuplesbetweenqueries,determining the number of tuplesthatwill get replacedis not straightforward.In particular, thefollowing observationholds.

Observation 2 Theactual number of tuples of a query that can be garbage collecteddepends on the setof queriesthatremain in the cache.

The above observation highlights the complexity of using the size as a factor to determine the set of queriesto bereplaced. It resultsin a circular dependency—the replaceablesizedependson what remainsin thecacheandwhat remainsin the cache depends on the ordering by the ratio of benefit to size. We assume, initially, that the total size of the query(as determined by the number of tuples hit in the last access) is a good estimator of the replaceable size of the query.Another approachis to evaluateeachquery and determine its count of “exclusive” (non-overlapping) tuples with respectto all theotherqueriescurrently in the cache. This is a high overheadoperation, requiring a counter to bemaintained foreachtuple to represent thenumberof queriesthataccess that tuple. Usingthis counter, a re-execution of the query canbeusedto determine thesetof “exclusive” tuples that thequery accesses, which becomesa measure of the replaceablesize.Thesetwo approaches—total size and exclusivesize— form two endsof thespectrum of heuristics usedto determine thereplaceable sizeof a query. An intermediateapproach is to order thequeriesby benefit (without using size asa factor) andthen re-executing eachquery in descending order of benefit while counting thenew tuplesaccessedby aquery to representits replaceable size.We are evaluating this andotherapproachesin our ongoing work.

4 Related work

Client-server database systemshave alsoaddressedreplacement issuesin page-basedand tuple-basedclient caches[5, 6].Replacement in a cache indexedby semantic units such asviews or query resultsintroduces different challenges, however.Semantic cacheshave beenproposedin client-server databasesystems [3] but that work addressedonly read-only cachingand useda differentstorageimplementation. Predicate-basedcaches[8] addressedsimilar issues,but optedfor a differentapproach to consistency maintenance. Tuples propagatedasa result of UDIs performed at the origin arematchedwithall cached query predicatesto determine if they should be insertedin the cache. A simplified form of semantic cachingtargeting web workloadsand using queries expressed through HTML formshasbeenrecently proposed[11], but this workdid not address consistency or replacement. The caching of query resultshasalsobeen proposedfor specific applications,suchashigh-volume major eventwebsites[4, 10]. The setof queriesin suchsitesis known a priori and the resultsof suchqueries are updatedandpushedby the origin serverwhenever thebasedatachanges.

5 Summary

Dynamic caching of dataon edgeserversbasedon theapplication query stream promises to beanadaptivecaching solutionwith a low administrativeoverhead.Wehave implementedaprototypeof acaching systemwhichmaintainspreviousqueryresults in sharedtableswheneverpossible. In this paper, wediscuss thechallengesof cachemaintenancein such adynamicenvironment, focusing onreplacement issuesin thepresenceof consistency guarantees.Wedescribeaconsistency protocolwhich propagatesUDIs performed on the origin databaseover cached tablesto the edge cachewithout first checkingwhetherthe new tuples matchthe cached query predicates.A “group replacement” mechanismoperates in thebackgroundand removesall excess tuplesfrom the cache. We discuss the challengesof devising a cache replacement policy for such adynamic environment and proposeinsights into addressingthesechallenges.

References[1] Akamai TechnologiesInc. AkamaiEdgeSuite. http://www.akamai.com/html/en/tc/core tech.html.

[2] K. Amiri, S.Park, R. Tewari, andS.Padmanabhan. DBProxy: A self-managing edge-of-network datacache. IBM Research Report, 2002.

[3] S. Dar, M. J. Franklin, B. T. Jonsson, D. Srivastava, and M. Tan. Semantic data caching and replacement. In VLDB Conference, pages 330–341,1996.

[4] L. Degenaro,A. Iyengar, I. Lipkind, andI. Rouvellou. A middlewaresystemwhich intell igently cachesquery results. In Middleware Conference,pages24–44,2000.

[5] D. DeWitt, P. Futtersack, D. Maier, and F. Velez. A studyof threealternative workstation-server architecturesfor object-orienteddatabasesystems.In VLDB Conference, pages107–121,1990.

[6] M. Franklin. Client data caching: A foundation for high performanceobject databasesystems. Kluwer, 1996.

[7] IBM Corporation. WebsphereEdgeServer.http://www-4.ibm.com/software/webservers/edgeserver/.

[8] A. M. Keller andJ.Basu. A predicate-based caching scheme for client-serverdatabasearchitectures. VLDB Journal, 5(1):35–47,1996.

[9] A. Labrinidisand N. Roussopoulos. WebView Materialization. In SIGMOD Conference, pages367–378,2000.

[10] A. Labrinidis and N. Roussopoulos. Update propagation strategies for improving the quality of data on the web. In VLDB Conference, pages391–400, 2001.

[11] Q. Luo and J.F. Naughton. Form-based proxycaching for database-backedweb sites. In VLDB Conference, pages191–200, 2001.

� ��������� ���������������������

�!#"%$�&('�)*$�+,.-0/214365879/;:=<?>A@CBD3E-F-07G<IH21KJL-0/MJ

NON4PRQ >A/TSVUA5RJKWI/23E-X:ZY P /M7EY [F\]3E-0-^Y 3EWF[

_(`ba�cedgfihVc

jlk#jnmpo^q4rFsgt%u6vXrIsAwyx{z|t}vFuR~EqO�0~�x{��q#rXqIu�x{s�~8�lvO�^qIu�sKqImA�D~�uEqO�^mR���vFu�z�qX~6xTvF�^m���s�~8�VsLsA���n����qO�}w�w%qX~EqO�^qFmRsKmL�}�n����m8~6vIuEqO�FsI��l����t%�%�%�{x{m�o%x{�%�^�%�LvI�^m�x�m8~6sL�^�L��qO�^qI�T�]m�x{mVvO���n����m�t}sK��xT�}�LqO�~6xTvF�^mL��~8�0t�sA�Eo^sA�E��xT�^�^��qO�}w�vIt]~6xTz|x{�AqO~�x{vI��vI���n��� �0�%sAu�x{sAmA¡���}�EovO��~�o%x�m¢�VvIu6�£w]sAt}sA�^w%m¢vF�¤��sAuR~EqOx{�¥qFm�m��%z|t]~�x{vI�}mqI�}vF�]~�jnk#jlmL�isF¡ �}¡T�l~6o%s�qO�^m�sL�^�Ls�vO�¦u6sA�L�%uEmRx{vI�§qO�^w¨�%vI�%�w]sL~�sAu�z|x{�%x{m�zb¡�©ªxT~�o�~6o%x�mn�LvIz|sAm�~�o%s¦�%sLsKwb~�vi«8�}m8~6xM���b~6o%sAm�sqFm�m��%z|t]~�x{vI�}mgqO�FqIxT�}m8~�jlk#jnm�xT�¬~�o^s�u�sKqO�]�VvIu6�{w�¡�k­o%x�mgt^qIt}sAum��%u6rIsL�]m­q¦�0�^z���sLu#vO�ejnk#jlm��LvI�{�TsK�Z~�sKw���u6vIz�~�o%sy©�sL���^qI�^wt%u6vXr�x{w%sAm|m8~EqX~6x{mR~�x��Lm��#xT~�o®u6sAm�t}sK�Z~�~6v¯q°rXqIu�x{s�~8�¯vI�y�Lu�xT~�sAu�x�q�LvIz|z|vI�%�{�±w]x�m��L�^m6mRsKw±x{�b�n����u6sAm�sAqIu6�Eo�¡

² ³�´ cpdgµ�¶�·bhVcg¸�µ ´

k­o%s��^u6mR~.qI�^w|z|vFmR~��#x{w%sL�{�¦�^m�sAw|z|sL~�o%v]w|vO��w]sKm��Lu�x{�%x{�%�y~�o^smR~�u6�^�Z~6�%u6s�vO��qO���l���¢w]v]���^z¬sA�0~�x{mi~�o%s¹j*v]���^z¬sA�0~¦k���t�sjns��^�^xM~6xTvF�§º=jnk#jy»Z¡g¼G~|x{m¬t^qIuR~|vO�#~6o%sb�n���§mR~6qO�}w%qOuEw¢½ ¾X¿G�qI�^w¹z¬vFu�sim�vIt%o^x{mR~�x��LqO~�sAw±~�v�vF�{m­��vFu�m8~6u��}�Z~��^u�x{�%���n�����^m��^�EoqFm��l���À�8Á]�Eo%sAz|q½ Ã%ÄO�­ÄZ¿�qOu6s��^qFmRsKw�vI��jnk#jlmL¡�Å#sK��sA�F~6�T�jlk#jnmyo}q4rIs¬�}sAsL�°��vI�%�^w°�^m�s����%�gx{��xTz|t%�{sLz|sA�F~6xT�^�¹sLÆ���x{sL�0~mR~�vFu6qI�Is¦m��]m8~6sLz�mn��vIuy�n���§½TÄAÇO¿�qO�^w°xT�°~8��t}sK�Eo%sA�E��x{�%�¹t%u�vI��Fu6qIz¬z|x{�%��qO�}w?�0�%sLu6�n��qO�^�I�^qI�IsAm���vFuC�l����½TÄKÃ��0ÄKÈ]�XÉA¿G¡gÁ�vO�Ê~���­qOu6s�m��^�Eo�qImn�n���ÂmR~�vFu6qI�Is?m���mR~�sAz�m*z�q4�bvI�%�{�b�.vFu��¹�VsL�{��#o%sA��~�o%s�m8~6u��^��~��%u6s�vI�V~6o%s±jlk#j£o}qIm?�LsLu�~6qOx{��t%u6vIt�sLu�~�x{sAmA¡Ë^vIun~�o%x�mlu�sKqIm�vI���DqImn�.sA�T�gqFm�~6v¹m�sLs¦�#o%s�~6o%sLuijlk#jnmyÌOz�qI�Ism�sL�^m�sAÌ^�­�.s���vI�{�{sA�Z~6sAwªq����%z¦�}sAu±vI�ijlk#jlm�qI�^w¢qO�}qO�{�0�AsAw~6o%sLx{ugm8~6u��^��~��%u6sI¡pÍ��¬sAqIu��{xTsAu�m��%u6rIsL��½TÄAÎ4¿]vF�¦jlk#jnmgw]x�m6���^m6mRsKmm�vIz|slvO��~6o%sLx{u­�Tx{z|xM~EqX~�x{vI�}m­�%�]~#w]v�sAmV�%vO~�mR~��^w%�|~�o%syt%u6vIt�sLu��~6xTsKm­~�o^qO~��.siqI�^qO�{���LslxT�¹~�o^x{m#t}qOt�sLuK¡Ï �^u*m8~EqX~6x{mR~�x��Lm#�VsLu6si��vF�T�{sA��~�sAw¹����~6o%s�Ð6jnk#jѼ9�^�0�%x{m�xT~�vIuE̽ È4¿G��qªt%u6vI�Fu6qIzÒ~�o}qX~�u6sAqFw%m�q¢jlk#j��yqX~�~�sAz¬t%~6m�~�v¢x�w]sL�%�~6xM����t%u�vF�%�{sLz�mV�#xM~6o�xT~A�^qI�^w±�LvIz|t%�]~6sAm#q¬���%z���sLu­vI���IuEqOt^o]�~6o%sLvFu�sL~�x��yt%u�vFt}sAuR~6xTsKm#vO�exT~A¡­Í���vI�^�Tx{�%s�w]sAz|v��LqO�b��si��vI�^�^wqO~|½ È4¿G¡¦k­o%s�mR~6qO~�x�m8~6x{�AmnxT~?��vF�T�{sA��~6mn�ÓqO�{�gxT�0~6v±~8�Vv¹�AqX~6sL�IvFu�x{sAmA�ÔTÕKÖE×XÔ�Ø w]sKm��Lu�x{�%x{�%�±~6o%s���xT�}w%mivO�­��vF�F~6sL�0~?z|v]w]sL��ml��vI�%�}w�qO~x{�^w]x{r�x{w]�}qO�gsL�{sLz|sA�F~?w]sK����qOuEqX~6xTvF�^mL�ÀqI�^w°Ù ÔTÕIÚE×OÔ^Ø w]sKm��Lu�x{�%x{�%�~6o%s?�IuEqOt^o]�=~6o%sLvFu�sL~�x��nmR~�u6�^��~��%u6sivO�p~�o%s¦jnk#jlm�qO�}w±~6o%s�w]v]����%z|sA�F~Em�~6o^qX~­��vF�]��vIu6zÑ~6v?~6o%sLzb¡�Ë%vFu�sLÛ]qIz|t%�TsF�pm6q4��xT�^�y~�o^qO~~6o%sLu6s*qOu6s­�%v?sA�TsAz¬sA�0~6mg�#xT~�o�ܬÝMÞ�ß6à���vI�0~6sL�0~�x�m�qi�{v��AqO�%t^u�vFt]�sAuR~8�F�X�#o%xT�{s.m6q4��xT�^��~�o}qX~p~�o%sVz�qXÛ]xTz¦�%z t^qX~6o��{sL�%�I~�o¬qO�{�TvX�VsAw����~6o%x{m#jlk#jáx�mVâ¬x�m#q¦�I�{vI�^qI��t^u�vFt}sAuR~8�F¡�ã.�{sAqIu��{�|~�o^sAm�s*~8�Vv

t%u6vIt�sLu�~�x{sAmCqOu6sp�^vO~CsA�F~6xTu6sL�{�*x{�^w]sAt}sA�^w]sA�F~K��q#jlk#j¯~6o^qX~p��vI�%�~EqOx{�^m.�LvIz|t%�{s�Û���vF�0~�sL�0~Vz¬v]w]sA�{mlºÓq��Tv]�AqO��t%u�vFt}sAuR~8�%»�x{m.�Tx{�IsA�T�~6vlqI�T�{vX�¯q*��qOu6�Is��0�^z���sLuevO�}w%x{mR~�x{�^�Z~gt^qO~�o^mpvI��q��FxTrFsL�¬w]sLt]~6oº=q¦�F�TvF�^qO�Àt^u�vFt}sAuR~8�%»Z¡

ägå�ä æ )�çáè æ èêé­ëgìîílï8çk­o%s�jlk#jnm±�.sAu�sbu6sA�LsL�0~��{�¯s�Û�~6u6qF�Z~�sKw¯��u6vIzð~�o%s��l����¡ vIu6�jlk#jÑu6sLt�vFm�xT~�vIu6��½ ÃI¾4¿ñ¡*©°s|qX~�~�sLz|t]~6sAwb~6v��L�TsKqO���%t�qI�T��~�o^sjlk#jnmVx{��x{�%xT~�x�qO�Àm6qOz|t%�{sn���|o^qO�}wÀò]o%vX�VsLrIsAu.m�vIz|sl�.sAu�s*~�v�v���%�{�CvI�psAu�u6vIuEm.~�v�qI�T�{vX�ó�^m#~�v�w]v|~6o%x{mA¡�©�s?�VsLu6sy�TsL�Ê~*�#xM~6o�q~6vO~6qI�ÀvO�gôIõ¬jnk#jlm­~6o^qX~#��vFu�z|sAw�~6o%si�^qIm�x�m.vI�pvF�%u�m��%u�rFsL�F¡Ï �^slz�q4��sLÛ�t�sA��~­~�o^qO~#~�o%simR~�u6�^�Z~6�%u�slvO�eq|jnk#jöw%sLt�sL�^w%m

vF��o%vX�÷xT~�x{m|x{�0~�sL�}w]sAw¯~�v��}s¹�^m�sAwÀ¡�Ë%vFu�~6o%x{m|u6sAqFmRvF��~�o^sjlk#jnm­qIu�slw]xTr�x�w]sAw±xT�0~6v?~�o^u�sAs*�%u6vFqFw��LqO~�sL�FvIu6xTsKmL� ×6øFø �Cà ×OùG×qI�^wªÜ�ß ù9× ¡�jnk#jlm¦~�o^qO~|qIu�s�t%u6xTz�qOu6x{�T��w]sAm�x{�I�%sKw���vIu|w%qX~Eqx{�0~�sLuE�Eo^qI�%�Is��}sL~8�.sAsL�÷qOt^t%�Tx��LqO~�x{vI�^m�qIu�sª�AqO�{�TsKw ×EøIø ��sF¡ �}¡��v�vI��z�qOu6��m�s�Û%�Eo^qO�^�Is�½TÄ4ÉK¿G¡*jlk#jnmn��vFu¬à ×OùG× qOu6si~6o%vFm�s�vF�%sz|x{�Io0~�xTz�qO�FxT�^s�sAqIm�x{�T�¯t^�]~R~6xT�%��x{�¢q�m8~6u��}�Z~��^u�sKw�w^qX~6qI�^qIm�sØ �^qIm�sL�}qO�{�lmR~6qO~�x�m8~6x{�Am�½TÄIÄ�¿ñ�n��vIubs�Û%qIz¬t^�TsF¡£ËpxT�^qI�T�{�ójlk#jnm�#o%v0mRsl���%�}�Z~�x{vI��x�m­~6v�w]sAm6��u6xT��sy~�o^s?m8~6u��^��~��%u6sivO�gw]v]���^z¬sA�0~z�qOu6���%t��lmR�^�Eo§qIm�~�o}qX~¹��vIu�Á�o^qI�IsAm�t�sAqOu6sIú m�t^�{q4�]m�½ Ã4¿ñ�yqOu6s~6sLu6z¬sKwbÜ|ß ù9× ¡gk­o%s*��vI�^�^w%qOu6x{sAm���s�~8�VsLsA�|~�o%sKmRs*~�o%u6sLs*�0x{�^w%mvI�ejlk#jÑqIu�sy�%vI~*���{sAqOuK¡�Ë^vIu�sLÛ]qIz|t%�Ts?q|ûV¼ Ï ����jlk#j£½ ÃOõ4¿�#o%x��Eo¢w]sKm��Lu�x{�}sKm±qI�óqO�%�^vO~6qO~�x{vI����uEqOz|sL�VvIu6�¯��vFu±�%x{vIt�vI�{�0�z|sLu�mRsK�0�%sL�^�LsAm°z|xT�FoF~��}sÂt%�]~�xT�îz|vIu6s�~�o^qI�övI�^sÂ�LqO~�sL��FvIu6xTsKmL¡¹k­o%sL����sAuR~EqOx{�%�T��w]sAm6��u6x{�}s±w^qX~6q^òp~�o%sA��qI�{m�v�qOu6s�w]sL�m�xT�F�%sAw¹qIm*q���vIz|z|vI�bm��0�0~EqXÛ±��vFu#s�Û%�Eo^qI�%�Ix{�%�¬~�o%s¦w%qX~Eq¦��s��~8�VsLsA��m6��x{sL�0~�x�mR~6mA¡�ü�vX�VsLrIsAuA�D��vFuy~�o%s�z|v0m8~?t^qIuR~y~6o%sLu6s|�VqFm�{xM~�~��{s�w]vI�%�%~�qI�}vF�]~|~�o%s��L�{qFm�m�xM���LqX~6xTvF��¡�k­o%sbsA�TsAz¬sA�0~|qI�^wqO~R~�u6x{�%�]~�sy�^qIz|sAm­�^m��^qO�{�{��~�sA�T�Ào%vX�§jnk#jlm�qIu�sl�^m�sAwÀ¡�jlk#jnm�#xT~�o��^qIz|sAm ÚEÕXÔ àX�eÝ ùG×OÔ Ý Ö vIu ø^×Xý�× Ù ý6×6ø%þ qOu6si�{xT�FsL�{�±��sL�{vI�%�|~6v~6o%s�Ü�ß ù9× �AqX~�sA�IvFu��F¡ Ï �%s¦�LqO��qI�{m�v�~�sA�T�C��u�vFz£~6o%s¦mR~�u6�^��~��%u6svI��~�o%sbjlk#jnm Ø z¬xTÛ]sAwÂ��vI�0~6sL�0~¦��vFu¦sLÛ%qOz|t%�{sI¡ Ï �*vF�%u¬ôFõjlk#jnmA��É��VsLu6s ×6øFø �­ÄA¾¹�VsLu6s�à ×Xù9× qO�}w°â0õ¹�.sAu�s�Ü�ß ù9× ¡�Í�LvIz|t%�{s�~�sy�{x{mR~�x{�%�|vO��~�o%sijlk#jöm6qOz|t%�{sy�AqO���}sy��vF�%�^w�xT��½ ô4¿ñ¡

ägåñÿ æ )�çîé��]ë��%+����%+����k­o%sbmR~6qX~6x{mR~�x��±qIu�s��Iu6vI�%t�sAw�xT�0~6vº8Ä4»?�{v]�LqI�­qI�^w¢ºñÃI»��I�{vI�^qI�t%u6vIt�sLu�~�x{sAmA¡

� ÕKÖ6×OÔ�ø�ý6Õ6ø ß ýZù ÝÓß��Z¡�¼9��qFw%w]xT~�x{vI��~�v|�%vO~6xT�^�¦~6o%sit%u6sAm�sL�^�LsvI�pz|xTÛ]sAwbvIu� ����¹��vF�F~6sL�0~A�}�.sisLÛ�~�uEqI�Z~6sAwbq|���%z���sLu*vO�

Ä

��������� ����� ������� "!����#%$'&)()*+( ,.-�/102-�3�4 5�6�/708,�,.9�:�4 3�;�/702;�6�5�4< ,.-�/102-�3�4 3�/702-�-�4 ,�;�/70=9�,�5�4(?>A@ B�/702B�4 B�/70=9)4 ,C/70�3D;�4E¬/GFF3EW P >K-X:83E-X: 9�/702H�4 ,C/70�3?,�4 ,�H�/70=9�6�6�4I.J K >K-FH;< 0 -F>A:MLi/GFI3ZW 4 3�3�/10�5�5�4 3�/702-�3�4 ,�,�/70�3�6�5�4I�N K >K-FH;< ,'9)/102-�9�4 3�6�/70�5�5�:�4 ,�,�/O3�H�;P >�Li\FHM3�F P >K-X:83E-X: ,.H�/10=9�6�4 :�/708,�-�3�4 :�/708,.6�6�4Q}/27G: ,.-�/102-�-�4 3�/702-R,�4 ;�/708,.;R,�4S /2-0UAH23 -�/70�:�4 ,C/70�3�H�4 3�/702;�-�4

k�qI�%�{s¬ÄF�gk­o%sit}sAu6�LsL�0~6qI�IsyqO�^w�~6o%si���%z���sLu#vI��~6o%s?��vF�F~6sL�0~#z|v�w%sL��m­xT�¹~�o^si�%xT�^s?����qIm6mRsKmL¡

mR~6qO~�x�m8~6x{�AmC�LvI�^�LsLu6�%xT�^�#~�o%s.m8~6u��^��~��%u6s�qI�^w?��vFz|t%�TsLÛ]xM~8�lvO�~6o%s¦�LvI�0~�sA�0~*z|v�w%sL��m*x{��rIvI�{rIsKw¹xT��~6o%s�jlk#jnmA¡*©�s¦qI�{m�v�{v�vI�FsAw���vFuit%u6vIt�sLu�~�x{sAml~�o}qX~i�.vF�%��w°q?T�sK�Z~y~�o%s�t}qOuEmRx{�%�vI�D~�o%sljnk#j¨vIu.xM~Em�qOz¦�%x{�I�%xT~8�¦x{����sLx{�%���^m�sAw�qFm�qy~8��t�s½TÄKÈ]��ÄA¾4¿

�OU ÔTÕFÚE×XÔCø�ý�ÕEø ß ýZù ÝÓßC�Z¡ Ï �^s��LqO���LvIz|s��%t��#xM~6o¯qb�%sLrFsLu��sA�^w]x{�%�imRsK�0�%sL�^�Ls­vO�}�Fu6qIt%o]�ñ~�o%sAvIu6s�~6x{�.t%u6vIt�sLu�~�x{sAm�~�viqO�%�qI�T���AsI¡±©�s±�Eo%v0mRs�m�vIz|s|~�o}qX~�z|x{�Io0~?�}s±xTz|t�vIu�~6qO�0~?x{�~6o%sbz�qOt%t%x{�%��vO�n~�o%sb�l���§xT�0~�v�m�vIz|sbw%qX~EqO�^qFmRs���vIu��z�qO~A¡?Ë%vFunsLÛ%qOz|t%�{sI�DxM�.~�o%s¬t%u6sAm�sL�^�Ls�vI�.qWVy�{sLsA�%s¬mR~6qIux{�^w]x��LqO~�sKm�~6o%s¦�^sLsAw���vIuyq�~6qI�%�{s�vIulmRvFz|s�vO~6o%sLuy��vF�T�{sA���~6xTvF��~8��t�si½MÄKÇ4¿ñ�IvI�%sVx{mex{�0~�sAu�sKm8~6sAw?x{�¬o%vX��w]sAsLt%�{�?�%sKm8~6sAw~6o%sAm�si�LqI�±��sI¡ Ï �%slx{m#qI�{m�v�x{�0~�sLu6sAmR~�sKw�x{���#o}qX~­�0x{�^w�vO�u6sA�L�%uEmRx{vI���AqO�±v]�A���%uVxT�¹jnk#jlmA¡g¼G��vI�^s*x�m.x{�0~�sAu�sKm8~6sAw�x{��}�^w]x{�%��m8~6u��}�Z~��^u�s¬��vIuyvFt]~�x{z|xT�KqX~�x{vI��vFun~8��t�s��9�Eo%sA�E��x{�%�qFm­xT��jlqX~EqO�I�^x{w]sKmi½MÄKõ4¿CvF�%sy�.vF�%�{w��{xT�Fsn~6v����%vX�¨qI�}vF�]~~6o%sl�0�^z���sLu­qO�^w��{sL�^�O~�o�vO��t^qO~�o^m.~6o^qX~­qOu6snqO�{�{vX�.sKw|�0�~6o%s?jlk#j�¡

©�s��%sLÛ�~�w]s��^�^s�mRvFz¬s��^vO~6qO~�x{vI�°~�v���s��^m�sAw°~�o^u�vF�%�Io%vF�]~~6o%s�t^qOt�sLuK¡|Í÷jnk#j¤½ ¾X¿��AqO���}s�w%sAm6��u6xT��sAw�qImyqb��vF�T�{sA��~�x{vI�vI��sL�{sLz|sL�0~#w]sA�L�{qIu6qO~�x{vI�^m�vO�À~�o^s*��vFu�zYX[Z]\°�#o^sLu6s�Xlx{m�~�o^ssA�TsAz¬sA�0~��^qOz|s�º�~8�0t�sK»?qO�^w^\¢x�my~6o%s±��vF�0~�sL�0~?z|v]w]sL�ñ¡bk­o%s�LvI�0~�sA�F~¦z¬v]w]sA��x�m�w]sL�^�%sKw��0�D�_\ �{�G` acbedgfCh� �ij kblXb\Mm'\nbo\pb \qbr\tsubo\tv bo\xwF�C�#o%sAu�syaiw]sA�%vO~6sAml~�o^ssAz¬t%~8�l�LvI�0~�sA�F~�z|v]w]sA�=�'dgfCh� �ij nw%sL�%vI~�sAmCm8~6u�x{�%�^��X�w%sL�%vI~�sAmCqO�sA�TsAz¬sA�0~l�^qIz|sI��ÐA� ̹qO�^wóÐ�b Ì�mR~6qI�^w���vFui��vI�}�LqX~6sL�^qO~�x{vI��qI�^w�%�^xTvF���FqI�^w�Ð.zXÌ^��Ð.vlÌyqI�^w�Ð'wKÌ�mR~6qI�^w¦��vIug�LsAu�vyvFu�z|vIu6sI�OvF�%svFu#z¬vFu�siqI�^w±vFt]~�x{vI�}qO�Àv]�L�L�%u�u6sL�}��sAmA¡¼9��~�o%s���vI�{�TvX�#x{�%�±m�sA��~�x{vI�^mn�Vs�t%u6sAm�sL�0~�~6o%s¦u6sAm��%�T~6m*vO��vI�^u

qI�^qO�{�]mRx�mL¡e��v0m8~#vO�p~�o^sn�}�I�%u6sAmVx{��~6o%x�m­t^qOt�sLu�qIu�syqImV��vF�T�{vX��m�%�^�TsKm�mivI~�o%sAu��#x�m�s�mRt�sA�LxM�^sKwÀ¡�jnk#jlm�qIu�s��Fu�vF�%t�sAw°�0�°~�o^sLx{u�AqX~�sA�IvFu�x{sAmA¡pk­�Vv¬rIsLu�~�x��LqI�D�Tx{�%sAm­t}qOu�~�xT~�x{vI��~�o%sir�x�mR�^qI�Tx{�AqO~�x{vI�x{�0~�v�~6o%u�sAs�qOu6sAqFmL¡§k­o%s��{s��Ê~6z|vFmR~A�#z|x{w^w]�Ts�qO�^w®u6x{�Io0~�z|vFmR~qIu�sKqImymRo^vX�Ñ~6o%s�u�sKmR�^�M~i��u6vIz ×6øIø jnk#jlmA�­à ×Xù9× jnk#jlm�qI�^wÜ�ß ù9× jnk#jlm|u�sKmRt�sA��~�x{rIsA�T�F¡^{.qI�Eo¯�}qOu|u�sAt%u6sAm�sL�0~6m¦q°jlk#j�¡k­o%s��^qOuEm�qOu6s�m�vIu�~�sAw®�#xT~�o^xT�ªsKqI�Eo¢qOu6sAq°��vFu±t%u6sAm�sL�0~6qO~�x{vI�t%�^u�t�vFm�sI¡

| } µ�h#f"~o�îdgµ��u��dpcg¸C��a

ÿ­å�ä '°$.&M�]çC&���ð$��*çCï±'�ï8ët�%��+����Dë��%+R$�&©�s¬�L�{qFm�m�xT�^sAwb~6o%s|��vI�0~6sL�0~lz|v]w]sA�pvI�g~�o%s¬~6o%u�sAs¦�AqX~�sA�IvFu�x{sAmA¡k­o%s��L�{qFm�m�xT�}�LqO~�x{vI�®vO�l~�o%s°��vI�0~6sL�0~±z|v]w]sL��m�x�m���vI�{�{vX�i��º8Ä4»d�fCh� �ij ��#º=ÃF»"aK�#ºÓ¾F»� �������ºÓâ0»?z¬xTÛ]sAw¯�LvI�0~�sA�0~A�#º=ÈF»�Ð�b Ì�vF�%�T��%�%~|�^vO~|z¬xTÛ]sAw���vF�F~6sL�0~A�lºÓô0»�ÐA� Ì�vF�%�T�Â��vI�0~6sL�0~A�*ºGÉO»¬��vFz¬�t%�{s�Û ��vF�F~6sL�0~A�°ºÓÎF»��{x�m8~�qI�^wºÓÇ0»�mRx{�%�I�{sI¡ k­o%s®�^uEm8~�~8�Vv�L�{qFm�m�sAm¹qOu6s�m8~6u�x{�%�®qO�^w¨sLz|t]~8�ó�LvI�0~�sA�0~A¡�k­o%s¯����qIm6m� ����w]sA�%vO~6sAmi�^v�u�sKm8~6u�x��Z~6xTvF��vF��~�o%s±m��%��sL�{sLz|sL�0~EmL¡¹��xMÛ]sKw���vI�%�~6sL�0~6mbqOu6s�z|xTÛ�~��%u6s°vI��~6s�Û�~bqI�^w¢sA�TsAz|sL�0~6mA�lË^vIubs�Û%qIz¬t^�TsF�d�Z º+���)��h�b �+ij %���8f�b d�fCh� �ij ]»�s�z|sAqI�^m*q�t^qOuEqO�Fu6qIt%o��LvI�0~6qIxT�^mnqz|xTÛ0~6�%u6s¦vI���%vIu6z�qO�ñ�À�}vF�{w°qO�^w°xT~6qO�{x��?~6s�Û�~A¡�k­o^s¦�^�Ê~�o�����qIm6mx�ml�%�%x{vI��vO�.sA�TsAz|sL�0~6mA��sI¡ �^¡�hR�� i���dgXyZ º���b ��b ��b �i»�¡¬k­o^sm�xMÛ�~6o±����qIm6mL��ÐL� Ì?vF�%�T���LvI�0~�sA�F~Vz|v�w%sL��mL��qOu6s#~��^t%�TsL�ñ�{x{�IsnmR~�u6�^���~6�%u�sF�CsF¡ �}¡T�x %h�h��)X�����Z� %h�h��)�2�8��X)wRm'f��8i���wRm'fC������i�����wF¡bã.vFz¬�t%�{s�Û±�LvI�0~�sA�0~Vx�m­q¦�LvI�0~�sA�F~­z|v]w]sL�D�#xT~�o¹�}vI~�o�ÐL� ÌImVqO�^w�Ð�b ÌImA¡�Cx{mR~�x�mÀ~6o%sV�LvI�0~�sA�0~Cz|v�w%sL�0�#xM~6o�vI�^s�sL�{sLz|sA�F~p�^qOz|sg��vF�T�{vX�VsAw����q Vy�{sLsA�%s¦mR~6qIunvFulq±t%�{�^mL¡�Á]xT�%�F�Ts¬x�m�~6o%s���vI�0~6sL�0~nz|v]w]sL��#xT~�obvI�^slsA�TsAz|sL�0~��^qOz|sl��vI�{�{vX�.sKw±���±qI��vFt]~�x{vI�^qI�#Ð'wKÌ%¡k­o^s°�%u6sAqI�]w]vX�#�ªx{mbm�o%vX�#�¢x{�¢~EqO�%�{sÂÄF¡÷k­o%s�mR~�u6�^��~��%u6s

vI��~�o%s|jlk#jnmyx�my���{vFm�sL�{�¹u6sL��qX~6sAw�~�v�~�o%sAxTui�LqO~�sA�IvIu6�I¡ijlk#jnmx{�¯~6o%s ×6øIø �AqX~6sL�IvFu���o^q4rFs�w]x{rIsAu6m�s�mR~�u6�^��~��%u6sAmA¡�ü�vX�VsLrFsLuK��Vs¬m�sLs|q��{qIu��Fs����%z���sLuyvI�.w%qX~Eq�mR~�u6x{�%�¹qI�^w�~6o%s¬~��%t^�TsL�ñ�{xT�FsmR~�u6�^�Z~6�%u6sAm�x{�°à ×Xù9× jlk#j¦¡�k­o^s�Ü�ß ù9× jlk#jnm­qIu�s*w%vIz|xT�}qX~�sKw����~6o%s?w%qX~Eq|m8~6u�x{�%��qO�^w±~�o^siz¬xTÛ]sAwb��vF�0~�sL�0~K¡¡*vO~6s�~�o^qO~�~�o^s�t%u6sAm�sL�^�Ls�vO�¢ �������vF�F~6sL�0~±z�qO�FsAm|xT~±x{z¦�

t�vFm6mRx{�%�{s�~�v°z|sAqIm��%u6s�mRvFz|s��IuEqOt%o%�=~6o%sLvFu�sL~�x��|t%u�vFt}sAuR~6xTsKm?x{�jlk#j�¡}k­o0�}m­�.six{�I�%vFu�sKw±m��^�Eob��vI�0~6sL�0~�xT�¹vI�^u�qO�^qI�T�]m�x{mA¡

ÿ­åñÿ éV!#&��]ët���%+��§'°$.ìîílï8ç�£�+j�O!k­o%s¦mR���0~6qOÛ���vFu*�n����jlk#j �LvI�0~�sA�0~*z|v�w%sL�eqO�{�TvX��m#vF�%si~6v�#u6xM~6s�qOu6�%xT~�uEqOu6xT�{�*�LvIz|t%�{s�Ûys�Û]t%u6sAm6mRx{vI�^mA¡À©�s�x{�0rFsAmR~�x{�FqO~�se~�o^s�LvIz|t%�{s�Û]xM~8��vO�?�LvI�0~�sA�F~�z¬v]w]sA�nx{�ª~6o%x{m¹m�sA�Z~6xTvF��¡öÍ àFß øDùÊþ���%�}�Z~�x{vI�ªx�m�w]sL�^�%sKw¢qIm�q�u6vI�^�Io�z¬sKqIm��%u6s�vO�y~�o%s°��vI�0~6sL�0~z|v]w]sL���LvIz|t%�{s�Û]xT~8�I¡¤�¥�¦�§�¨2©�¨�ª�§¬«�­® þ ß�Ý=¯Dà)° ÖLù Ý=±Xß±àFß�²M¯^Ý ù Ý Õ ¯ Õ�³¬ùÊþ ßyw]sLt]~6o Õ�³ù�þ ß ÖEÕ ¯ ù ßD¯ ù Ü Õ àFß Ô�´h�X'dAi�µ�º�aZ»t`¢õ^ò�h�X'dAi�µ�º�XX»¶`öÄIò

Ã

Ëpx{�I�%u6s*ÄF�C��qXÛ]x{z��%záw]sAt]~�o¬vO�]~6o%s­��vI�0~6sL�0~�z|v]w]sL��x{�¦jlk#jnm

h�X'dAi�µ�º�\tsO»¶`·h�X'dAi�µ�º�\tvy»¶`lh�X'dAi�µ�º�\xw4»M`¸h�X'dAi�µ�º�\e»¹v¢ÄIòh�X'dAi�µ�ººdgfDh� �ij ]»¶` ÄFòh�X'dAi�µ�º�\t»�m�\�¼?m�½¾½�½�m�\�¿^»¶`¸h�X'dAi�µ�º�\t»?b \�¼RmD½�½¾½=b \�¿^»¶`À �Á�º�h�X'dAi�µ�º�\�Âñ»�»¹v¢ÄRmÄÃ�µgX���X�Ä�Å�MÅ�g¡Ëpx{�I�%u6s�ÄVmRo%vX��mÀ~�o^qO~C~�o^s�z|v]w]s�vI�0~6o%s�z�qOÛ�x{z��^z w%sLt]~6o?vO�

~6o%s�u6sL�I�^�{qIupsLÛ]t%u�sKm�m�x{vI�¬x{m.¾iqO�^w¦~�o^s�z|vFmR~���vIz|t%�{s�Û¬s�Û]t%u6sAmR�m�xTvF��o^qFm�~6o%s¬w%sLt]~6o�rXqO�{�%s|Ç%¡yk­o%x{mlx{�^w]x��LqO~�sAm*~�o}qX~i��vI�0~6sL�0~z|v]w]sL��m�qIu�s­�^m��^qI�T�{���%vI~.��vFz|t%�TsLÛÀ¡pk­o%s*z¬v0m8~��LvIz|t%�{s�Û|��vI�%�~6sL�0~±z|v]w]sL�¬º��#xT~�o®~�o^s�w]sLt%~�o¢ÇF»�x{m|��vF�%�^w®x{�®~�o%s�©¢¾0ãnú mÁ]�AqO��qO�%�{s�ÆgsA��~�vFu�ÇluEqOt%o%x��Lm±jnk#j ½ ÎO¿ñ��m�o%vX�#�®�}sA�TvX�i¡§©�ssLÛ�t}qO�^w]sKw�xT~6m?sL�0~�xT~�x{sAm?qI�^w�mRo^vX�á~�o%s�sA�TsAz|sL�0~�w]sA�L�{qIu6qO~�x{vI���sL�{vX�i¡�k­o%sV��vF�F~6sL�0~�z|v]w]sA�0sA�^��v]w]sKm�~6o%s.m�sLz�qI�F~6x{��vO�^qI�T�{vX�­�x{�%��qX~iz|vFmR~ivI�%s|vO�?àIßC� Ö � ù Ý ùñÔ ßL��qI�^w�Ü�ß ùG× à ×Xù9× sL�{sLz|sA�F~ix{�× ¯AÈ ÕXý àFß ý ¡�Á�t�sA�LxM���LqO�{�{�I�0~�o%slt�vIu�~�x{vI��vI�À~�o%sy��vF�F~6sL�0~.z|v]w]sA�x{���Tx{�%s�vF�%s�o^qImiq�w%sLt]~6o�r4qI�T�^s�ô%¡�©�s��%�^w%sLu6�Tx{�%s�q�Ü�ß ù9×)Éà ×Xù9× sL�{sLz|sL�0~±�}qOz|s�qO�^wÂ~6u6qF��sbvI�ªo%vX�£~�o%s°w]sLt]~6oªrXqO�{�%sx{�^��u6sAqFmRsKmL¡ek­o%syz|sAqI�%xT�^�¦vI��q¦�LvIz|t%�{s�Û±�LvI�0~�sA�F~­z|v]w]sL���AqO�sKqIm�xT�{����sA��vFz|sl�^vI�]�ñ~�u6xTr�x�qO�ñ¡

d� �i�µÊZ ºRºRº�º�h�X)��f?mAºRº�i��8ij��X�m À X�ij �h� �ij w » bMº À X�ij %h� �ij �m�i��+ij�ËX)wK»�» w » bº2i��8ij��X�mAº�º�h�X)��f?m À X�ij %h� �ij %w4»�bMº À X�ij %h� �ij �m�h�X)��fDw4»R»�w4»�bº À X�ij %h� �ij gmKºRºËh�X)��f)m�i��8ij�ËX�w4»�bMº�i��+ij�ËX�m�h�X���fDw4»�»�w4»R»�wK»Cmº� ���� À �ijXÄb ��X�iDb ���� À �ijX�ÌÍ��i������Mb ���� À �ijX)���)������b ���� À �ijX��Î�� ��¹��Ï���� À »�»

ÿ­å+Ð èªç���çC"^ì +R&*+���ìk­o%só�n����Á�~6qO�}w%qOuEw�½ ¾4¿bw]s��}�%sAmóàFß ù ß ý ܦÝ=¯}ÝÑ� ù Ý Ö ��vI�0~6sL�0~z|v]w]sL��me~6v?�}s�~�o%v0mRs�w]vi�^vO~�u6sA�0�%x{u�s#�{v�vI�¦qIo%sAqFw��#o^sL��t^qIu6mR�x{�%�^¡�¡*vI�]�9w]sL~�sLu6z|xT�^x{mR~�x��|��vI�0~6sL�0~iz¬v]w]sA��x�mi�%vO~?qI�T�{vX�.sKw�x{��l���îjlk#jlm�½ ¾X¿G¡£ã.sLu�~6qIxT�ª~8��t�sAwó�n��� �0�%sLu6�Â��qO�^�I�^qI�Isx{z|t%�TsAz|sL�0~6qO~�x{vI�¨½;ÉK¿�qFm�m��%z|sAm|w]sL~�sLu6z|xT�^x{m�zb¡¯Í���s�Û%qOz|t%�{svI�­�^vI�]�9w]s�~6sLu6z|xT�%x�mR~�x�����vF�F~6sL�0~�z|v]w]sL�.x{mÓÒ ×�Ô?Ú b ×�Ô�ÖjÕ qI�^w~6o^qX~VvO�Cw]sL~�sLu6z|xT�^x{mR~�x��*�LvI�0~�sA�F~.z¬v]w]sA�}x�m ×�Ô Ò Ú b Ö�Õ ¡gk­o%s*¼9�%��0�%x�mRxT~�vFu*w]sL~�sA��~6mlȬ�%vI�]�9w]sL~�sLu6z|xT�^x{mR~�x��?��vF�0~�sL�0~�z|v]w]sL��m*qI�^w~6o%sL��qOu6s|��vI�%�^w�x{��â�jnk#jlmL¡¹Ë%vFu?s�Û%qOz|t%�{sI�C~�o%s���vI�{�TvX�#x{�%��%vF�]�9w]s�~6sLu6z¬x{�%x�m8~6x{����vI�0~6sL�0~�z|v]w]sL�nx{m±��vI�^�^wªx{�ª~�o%s�jnk#j��u6vIzá�VvIu6��Ö^vX��z�qO�^qI�IsLz|sA�F~e��v0qO�{xM~6xTvF��½ ÃIÃK¿G��×ÍØ ÙÚÌÍÛ^Zº���X�Ü���X)�DiDbTº2�)X�Ü���X���i�m���X)��dg���¹��XX»R»�¡

ÿ­å�Ý Þ ìOßl+jàMán+��O!©�si��vI�{�{vX�ó~6o%s�w]sL�^�%xT~�x{vI��vO��qIz��%x{�I�%xT~8�±x{�¯½ âO¿G¡VÍ*��s�Û]t%u6sAmR�m�xTvF�"â�x{mpqOz¦�%xT�F�%vI�}m�xT�^qO�^wivF�%�{�lxT�]~�o^sLu6s�s�Û]x�m8~Em�m�vIz|s�mR~�u6x{�%���xT�ÊÛnº�âi»#m��^�Eob~�o}qX~�~6o%sLu6s��LqI���}s¦w]x{mR~�x{�^��~n�­q4��mV~6v�t^qIu6m�smR~�u6xT�^�y�F¡e¼G~�x�m�m8~6u6qIxT�Fo0~R��vIu6�­qOuEw¬~�v�m�o%vX�¢~6o^qX~*qI�0��qIz��%x{�I�%�vF�^m#��vF�F~6sL�0~#z|v�w%sL��x�m­�%vI�%�Gw]sL~�sAu�z|x{�%x{mR~�x��O¡ÆgsAu��|vI�Ê~�sL�C��q¦w%qX~Eq�m�vI�%uE��s*s�Û]t}vFuR~Em�xT~6m­w%qO~6q�xT���l������vIu��

z�qX~?qI�^w�~6o%s�u�sK��sAxTrFsLulz|qIt^mn~�o%s��l����xT�0~6v¹m�vIz|s|vI�]«8sK�Z~Em½TÄ4ÉA¿G¡eÍ�z��^xT�F�%xM~8�¬x{�|jlk#jlm�z|sAqI�^me~�o}qX~g~�o^s�z�qOt%t%x{�%�ix�mg�%vI~�%�^x{�0�%sF¡�Í��¯qIz��%x{�I�^vI�^mi�LvI�0~�sA�F~iz|v]w]sL����u�vFz(q�jlk#j÷��vFuvFt}sA�¹qIt%t%�{x{�AqX~�x{vI��½TÄAô4¿Cx�m�m�o%vX�#�±��sL�{vX�i�d� ���i���X��_Z º2�� À X w�m�����X�i�� À X)w�m�d� ���i�������h%wRmËdg ���i�����i���d�X)wRm

�D�%��f��8��h%w�m� %f�i��8ã�X)w�m�fC�g���)X���f���wRm'h�X)��f����ÑdAi��¹w�mhR���¹����� À ��X��?wRm�ä%�ËX���i��+i�����w�m��� À X s%m�d� ���X���i���h%wRmdg ���i�������h�Á�w�m�d� ���i������) �i8äÄw�mËdg ���i������)�)�ËX�w�mdg �� À X�i�µg�)h�w�m�ij �Á�X�Á�X À dgi�wRm�ij �ÁA��h�w�m�ijX�� À ��h�w�m����X��) ��)X� Äw�m� %h�h��)X)���)sÄm�fC����ij %f�i�sI»

ã.vF�^mRx�w]sAuC~6o%s­�%�^w]sAu��{xT�^sAw?sA�TsAz|sL�0~g�^qOz|sAmA¡C©ªo^sL�¦~�o%s­�n���w]v]�L�%z|sL�0~¦x�m¦mR~�u6sAqIz|sAw�vXrFsLu�q��%s�~8�VvIu6�D��~6o%s ø^×XýZù ¯�ß ý sA�TsL�z|sL�0~yqOu6u�x{rIsKw±��vF�T�{vX�VsAwb���¹qÓ¯ × Ü|ß*sL�{sLz|sL�0~K¡nk­o%s¦u�sK��sLx{rIsAu�AqO�%�%vI~?w]sA�Lx{w]s�xT�#xM~?x�my~6o%s|�^uEm8~?vFt]~�x{vI�^qI�å¯ × Ü|ßyvFui~�o^s�mRsL��0�%sA�^��s�vI�[¯ × Ü�ßRmA¡±��qIt%t%x{�%��xM~�~�v�sLxT~�o^sLu�¯ × Ü�ßi�LqI�°��vFu�zq°t^qOuEm�sI¡¯k­o%s¹¼9�^�0�%x�mRxT~�vFu|w%s�~�sK�Z~6sAw®Ã�qIz��%x{�I�%vF�^m¬��vI�0~6sL�0~z|v]w]sL��mA¡

æ ç ~�µ¦`bf"~�Ñdgµ��u��dpce¸D��a

Эå�ä è ç�ët�^)*ëxßn+Rï�+��O!k­o%s�w%s��^�%xT~�x{vI�±vO�Du�sKqI�Eo^qI�%�Ts­sA�TsAz¬sA�0~��^qOz|s�x�m��FxTrFsL�|��sL�{vX�i¡¤�¥�¦�§�¨2©�¨�ª�§êé�­åë�¯�ß Ô ßLÜ�ßC¯ ù ¯ × Ü|ß�X�ì�ÝÑ�.u�sKqI�Eo^qI�%�Ts ³Eý�Õ Ü7X ÔàFßC¯ ÕXù ß6à Ú È[X[íîX�ì Ô Ý ³ ßLÝ ù�þ ß ý X�Zî\ × ¯Dà�X�ì ÕKÖEÖ ° ý ��Ý=¯y\ Ô�ÕXýX�í]X�ì ì × ¯Dà"X�ì ì�íîX�ì=¡ï ÝÊÜ¦Ý ÔT×OýZÔ È Ô|×�Ö6Õ ¯ ù ßC¯ ù Ü Õ àIß Ô \�ÝÑ�¬w]sLu6xTrXqI�%�Ts ³Eý�Õ Ü × ¯óß Ô ß ÉÜ�ßC¯ ù ¯ × Ü|ßyX Ô àFßC¯ ÕXù ßEà Ú È�Xuíð\ Ô Ý ³ ßLÝ ùÊþ ß ý X^Zñ\ Ô|ÕXýX�í]\�ì Ô X�ìgZî\�ì ì Ô�× ¯Dà�\u`l\�ìG½ X�ì=ò)\�ì ì2¿ Ôtó�þ ß ý ßå\�ìñ½ X�ìÑò?\�ì ìM¿VàIß É¯ ÕXù ß�� ùÊþ ß ÖEÕ ¯ ù ßD¯ ù Ü Õ àIß Ô�ÕIÚLùG× Ý=¯�ßEà Ú È���° Ú � ù Ý ù ° ù Ý=¯0Ù�\�ì ì ³�ÕOý×OÔÊÔ�ÕAÖEÖ ° ýZý ßD¯ Ö ß�� Õ�³ X�ì.Ý=¯ \�ì�ô¤�¥�¦�§�¨2©�¨�ª�§^õ¹­�ë�¯bß Ô ß�Ü�ßC¯ ù ¯ × Ü�ßpX|ÝÑ�pu6sAqF�Eo^qO�^�TsyÝ ³ �"íîX Ôó�þ ß ý ß���ÝÑ� ù�þ ß�¯ × Ü|ß Õ�³?ù�þ ß ý�ÕKÕXù ß Ô ßLÜ�ßC¯ ù ô�ö ù�þ ß ý�ó ÝÑ�Lß|ß Ô ß ÉÜ�ßC¯ ù ¯ × Ü�ßÎX�ÝÑ� Ö6×OÔÊÔ ßEà�°Ä¯ ý ß ×IÖEþ%×FÚ�Ô ß ÕOý °%��ß Ô ßC�.�Cô©�s�qIm6m��%z|s�~6o^qX~�~6o%s�u6v0vI~�vO�*~6o%s�m6qOz|t%�{sbjlk#j¤x�m|�%vI~

���%vX�#���Cx=¡ sI¡±qI�T��sL�{sLz|sL�0~?�^qIz¬sKmi�LqI����s|~�o^s�u�v�vO~isA�TsAz¬sA�0~�^qIz|sI¡�ËpxT�F�%u�sbÃ�m�o%vX��m�~�o%s¹���%z���sLu¬vO�n�%�%u6sAqI�Eo}qO�%�{s±sA�TsL�z|sL�0~e�^qOz|sKmCx{�¦jlk#jnmA¡ek­o%sAm�s.�^�%u�sKqI�Eo^qI�%�{s�sL�{sLz|sL�0~e�^qOz|sKmqIu�s¦sLxT~�o%sAu*~6o%s|u�v�vO~lsL�{sLz|sL�0~y�^qOz|s¬vIul�^m�sL�{sAm6mL¡ik­o%s¬z|v�w%svI��~�o%s¬���%z���sLuyvI�.�%�%u6sAqF�Eo^qO�^�Ts¦sL�{sLz|sL�0~y�^qOz|s¬x{m¬ÄI¡¦k­o0�}mjlk#jnm�rIsLu6�?vI�Ê~�sL��o}q4rIs#qy���{sAqOug�^vO~�x{vI�|vI�}~6o%s�u6v0vI~gsL�{sLz|sL�0~A¡Í*� ×EøIø jlk#jª�#xT~�o�ÃOõn�%�%u6sAqF�Eo^qO�%�{sVsL�{sLz|sL�0~��^qIz¬s­sA�^��v]w]sKmqI��qOt%t%�{x��LqX~6xTvF���DqO�^w�w%x÷TDsLu6sL�0~n��x{�^w%mnvO��m�z�qO�{�Cu6sA�0�%sAmR~nqI�^wu6sAm�t}vF�^m�s?z|sAm6m�qI�IsAm�xT�°�n���ÂmR���0~6qOÛD¡[{�qF�Eo�vO�g~�o^sAm�s¬m�z�qO�{�z|sAm6m6qO�IsKm�qOu6s��%vI~�u6sAqF�Eo^qO�%�{sg��u6vIz qO�{�FvI~�o%sAu�sA�TsAz|sL�0~��^qIz|sAmA¡

¾

Ë�x{�I�^u�s?Ã]�tø*�%u�sKqI�Eo^qI�%�{slsA�TsAz|sL�0~#�^qOz|sAm­x{��jnk#jlm

Á�sAt^qOuEqX~6xT�^��~�o^s¹�^�%u�sKqI�Eo^qI�%�{s�t^qOu�~6m¬x{��jlk#j�x{�F~6v�m�z�qO�{�TsAujlk#jnm*qOt%t�sAqIu­~�v|�}s?q|��s�~�~�sAu*w]sAm�x{�I��¡

Эåñÿ è ç¹��á*"A��+8$�&��©�s�w]s��}�%s#u�sK���%uEm�xTrFsVjlk#jnm.qO�^w¦~8�Vvy�0x{�^w%mevI��u6sA�L�%uEmRx{vI�^meqFm��vF�T�{vX�i¡¤�¥�¦�§�¨2©�¨�ª�§¸ù¹­�ë7ú�®�ú£ÝÑ��u6sA�L�%uEmRx{rIs�Ý ³|× ¯Dà Õ ¯ Ô È�Ý ³ Ý ù.þ^× �× ¯�ß Ô ß�Ü�ßD¯ ù ¯ × Ü�ßÎX ��° ÖEþbù�þ%×Xù X[íîX × ¯Dà�X�ÝÑ� ý ß ×IÖEþ%×FÚ�Ô ß�ô¤�¥�¦�§�¨2©�¨�ª�§üû�­¬ëýú�®�ú ÝÑ�¬�{x{�%sAqIu�u�sK���%uEmRx{rIs�Ý ³�× ¯Dà Õ ¯ Ô ÈÝ ³ Ý ù ÝÑ� ý ß Ö ° ý �ZÝ=±Xß × ¯Dà ³�ÕOý�× ¯gÈ ý ß ×IÖEþ%×IÚLÔ ßbß Ô ß�Ü�ßD¯ ù ¯ × Ü�ß�X× ¯�à × ¯AÈÚXþíÿ\ Ô X ÕAÖEÖ ° ý � ×Xù Ü Õ � ù�Õ ¯ Ö ß°Ý=¯þ\ × ¯Dà ù�þ ßÕKÖEÖ ° ýZý ßD¯ Ö ß�ÝÑ��¯ ÕXù ßC¯ Ö�ÔTÕ �Lß6à¯Ý=¯������ ÕXý ������ô·ëýú�®�ú ÝÑ�� × ÝÓà ù9Õ¦Ú ß[¯ Õ ¯ É9Ô Ý=¯�ß ×Xý�ý ß Ö ° ý �ZÝ=±4ßnÝ ³ Ý ù ÝÑ� ý ß Ö ° ý �ZÝ=±Xß Ú ° ù ÝÑ�ί ÕOùÔ Ý=¯Dß ×Xýiý ß Ö ° ý �EÝ=±Xß�ôã.vF�^m�x{w]sAu¨qI�¥sLÛ%qOz|t%�{sövO�¯q÷�%vF�]�G�Tx{�%sAqIu¢u6sA���^u6m�xTrFs sL�T�

sAz¬sA�0~¥jlk#j ½TÄ4ÉA¿G¡ ³�ÕXÔ àIß ý Z ù Ý ù=Ô ß Ô Ý=¯ ³�Õ Ô àIßC� Ö ÔÒ ÚEÕAÕ�� Ü ×Xý�� b ³�ÕXÔ àFß ý b ×XÔ Ý × �DbG�Lß ø%×Oý�×Xù9ÕXý8Õ ��¡ Í ��vI��w]sAu�o^qFm�m�vIz|svFt]~�x{vI�^qI�gxT�%��vIu6z|qO~�x{vI��qO�^wÀ�pqOz|vI�^��vI~�o%sAuy~�o%x{�%�FmA�Cq¹�{x{mR~?vO���vF�{w]sAu6mA¡ ³�ÕOÔ àIß ý w%v0sKm±�^vO~�v]�A���%u¹xT� Ú6ÕKÕ�� Ü ×Xý�� � ×OÔ Ý × ��qI�^w�Lß ø%×Oý�×Xù9ÕXý ¡�ü*vX�.sArIsAuA� ³�ÕXÔ àIß ý x�m#sA�^���{vFm�sAw±xT��q�Ð.zOÌ%¡¡*v±�{xT�^sAqOunu�sK���%uEm�xTrFs�jnk#j x�m#��vF�%�^w�x{��vF�%unm6qOz|t%�{sI¡lk­o%s

m6qOz|t%�{s.�LvI�0~6qIxT�^mgÉ��OÃ*qO�^w¬ÃOô#�%vF�]�G�Tx{�%sAqIu�u6sA�L�%uEmRx{rIs�jlk#jnmpx{�~6o%s ×6øFø ��à ×Xù9× qO�^w°Ü�ß ù9× �LqX~6sL�FvIu6�¬u6sAm�t�sA�Z~6xTrFsL�{�I¡gk­o%s?qI�^qO�T��]m�x{mVmRo^vX��mg~�o}qX~�~6o%sLu6snqIu�snqi��sL�öºñÃIÈF»g�%vF�]�ñu6sA�L�%uEmRx{rIs�jlk#jnmx{�bvI�%u�m6qOz|t%�{sI¡�à ×OùG× jnk#jlm�qOu6sy�^m��^qO�{�T�±�%vI�]�Gu6sA���^u6m�xTrFsI¡

Эå+Ð é#+Rì ínïRç� bë��])¥ëg&Î� é­+Rì ílï8ç '�! �ÀïRçË^vIug~6o%s#�%vI�%�ñu6sA�L�%u6m�x{rIs­jnk#jlmL�0�Vs#xT��rIsKm8~6xT�0qX~6s.~6o%sLx{u��{vI�%�FsAmR~m�xTz|t%�{s|t^qX~6o�¡�k­o%s|u�sKmR�^�M~ix�mlm�o%vX�#��xT�°�^�F�%u�s|¾%¡|k­o%s|�{s��Ê~R�z|vFmR~lqIu�sKq�x�m�sLz|t]~8��m�x{�^��s¬qO�{��jlk#jnmlx{��~�o%s¦�^uEm8~y�LqO~�sL�FvIu6�qIu�sbu6sA���^u6m�xTrFsI¡®k­o%s�u6sAm��%�M~�m�o%vX��m¬~�o}qX~�~�o^s��TvF�%�IsKm8~�t^qO~�ox{��mR�}�Eo�jnk#jlm�x�m��%vI~�~�v�v��{vI�%��ºÓz|vFmR~��{���{sAm6m­~�o}qO��ÎF»�¡#k­o%sjlk#jÂ��vIuCsLÛ%�Eo^qO�%�FxT�^�­�^�^qO�}��x�qO�Fw^qX~6q�½MÄAâX¿�qI�T�{vX��mD~6o%s��{vI�%�FsAmR~m�xTz|t%�{s�t^qO~�o��#xM~6oÂ~6o%s��{sL�%�I~�o¢ÃOõ^¡®k­o^sbmR~�u6�^��~��%u6s�xT�Â~�o^sjlk#j¤x�m±w]vFz|xT�^qO~�sKwÂ���Â~��%t^�TsKm�qO�^wªmR~�u6x{�%�FmA¡ó©�s�w]v¯�%vI~o^q4rFs�qO��s�Û]t%��qO�^qO~�x{vI��vI��t}sAu�z|xT~R~�x{�%��m��^�Eo�qb�TvF�%�bt^qX~6o°x{�~6o%x{m*jnk#j¦¡©�s*qI�{m�vnx{��rIsKm8~6xT�0qX~�sV~6o%s#���%z���sLu�vI��m�xTz|t%�{s#t^qO~�o^mgx{�|�%vI�%�

u6sA�L�%u6m�x{rIs|jnk#jlm?qO�}w�~6o%s����%z���sLuivI�Vm�xTz|t%�{s����]���{sAmyx{��u�sL�

ËpxT�F�%u6sy¾^�g��vI�^�IsAmR~�mRx{z|t%�{sit^qX~6o�x{�b�%vI�]�Gu6sA���^u6m�xTrFsljlk#jlm

Ëpx{�I�%u6s�â^��k­o^s±���%z���sLu¬vO�*m�x{z¬t^�Ts¹t^qX~6o¯x{���%vF�]�ñu6sA�L�%uEmRx{rIsjlk#jnm

�L�%u6m�x{rIsnjlk#jlmL¡�Á�x{z|t%�{sy�L�]���{snx�m­q�t^qO~�o�xT��~�o%sl��vFu�z X?»4�ÄX�¼O�¡{¡T¡xX�����X?»X�F�#o%sAu�s X?»X��X�¼X��¡T¡{¡tX��nqIu�s�w%x{mR~�x{�^�Z~.sL�{sLz|sL�0~��^qIz|sAmA¡k­o%s¹uEqO�%�Fs±vO�*~�o%sb���%z���sLu|vI�nmRx{z|t%�{s�t^qO~�o^m¬x{��~6o%sbjnk#jm6qOz|t%�{s|x{mi�#x�w]sF¡±Ë%vFui~�o%s����%z¦�}sAu?x�mi�{qIu��FsLul~�o}qO�¯È]Ä4Ã]���Vs~6u�sKqX~nxT~?qIm�Ð8~6v0v�z�qO���]̱qO�}w�w]v¹�%vO~im�o%vX� ~�o%s¬sL�0~6xTu6s¦�}qOuK¡Ëpx{�I�%u6s�â�m�o%vX��m¦~�o%sb���%z¦�}sAu�vO�lm�x{z¬t^�Tsbt^qO~�o®qO�{�TvX�VsAw��0��%vF�]�Gu�sK���%uEmRx{rIs¬jnk#jlmA¡|jlk#jlm?qOt%t�sAqIuysLxT~�o%sAu?qO�{�TvX�#x{�%����sL�vFu­q|��qOu6�Isl���%z���sLu#vI�em�x{z¬t^�Tsyt^qO~�o^mA¡k­o^sV���%z¦�}sAugvO�DmRx{z|t%�Ts#�L���L�Ts­qO�{�{vX�.sKw?���iu�sK���%uEm�xTrFsVjlk#jnm

x�mymRo%vX�#�°xT�°�^�F�%u�s�È]¡¦Ë%vIul~6o%s|�0�^z���sLuyx{my��qOu6�IsAu�~6o^qO��ÈOõ^��Vs¹w%v°�%vI~�mRo%vX��~�o%sbsL�0~6xTu6s��}qOu|�%�]~�~6o%s�qI�Z~6�^qO�­rXqO�{�%s¹x{mm�o%vX�#��vI��~6vIt¯vI��xM~K¡���v0m8~|jnk#jlm��#xT~�oÂ�{qIu��Fs����%z���sLu¦vO�m�xTz|t%�{s��L�]���{sAm�qIu�s�w]sAm�xT�F�%sAw���vFu¦w]v]�L�%z|sL�0~¦z�qOu6���%t�¡�k­o^sm�o%vIu�~i�^qOuEmyx{��Ü�ß ùG× �AqX~�sA�IvFu���qIu�s¦~�o%s�jnk#jlmy��vIu�z|qIu���x{�%��%t°mRz�qO�{�Cz|sKm�m6qO�FsAmA¡�Å#sA�L�%u6m�x{rIs�à ×Xù9× qI�^w ×6øFø jlk#jlm*o^q4rFsm�z�qO�{�À�0�^z���sLu#vO�gm�xTz|t%�{s?���]���{sAmA�]�#xT~�o�mRvFz|slsLÛ%��sLt%~�x{vI�^mA¡

Эå�Ý '�)*ëg+R&¥$���ép�]ëp"g�ÍÑu�sKqO�CsLÛ%qOz|t%�{s±½ ÇO¿pvI��q��Eo^qIxT��vI�.Ã�mR~6qOuEm�x�m*��vI�{�TvX�i�ißC¯ ù Ý ù ÈZ ¯ × Ü|ß� Ô�ÖEÕ ¯ ùG×FÖ�ù � ÔeÔTÕKÖ6×Où Ý Õ ¯�� Ô}ø^þ%Õ ¯�ß� Ô�³�× Þ�� Ô ßLÜ × Ý Ô �pqI�^wÔTÕKÖE×Xù Ý Õ ¯þZ × àFà ý ßC�.��� Ô�Ö Ý ù È� Ô|ý ßGÙOÝ Õ ¯� Ô*ø%Õ � ùG×OÔ Ô�ÖEÕ °Ä¯ ùñý È��¡Í �%�^m�x{�%sAm6mysL�0~�xT~8��o^qFmiq¹�Tx�m8~?vI�.�{v]�LqX~6xTvF�^mA¡�{.qI�Eo��{v]�LqO~�x{vI�o^qFm.m�vIz|syqIw%w%u�sKm�m.�Tx{�%sKmL¡�ͧm8~EqOu��4t%�{�^m.xT�bq¬���]���{sn�{sAqFw%m.~�v|q�Eo^qIxT�°vO�­m8~EqOuEmn�#xT~�o�x{�]�^�%xT~�s��{sL�^�O~�oC¡|Á��}�Eo��Eo^qIxT�}mlqIu�s¬�%vI~x{�^���{�^w]sKw�xT��~6o%x�m�qO�}qO�{��m�x�mL¡¹k­o%s�u6sAm��%�M~?vI�­~�o%x�m�qI�^qO�{�]mRx�myx{mm�o%vX�#��x{���^�I�^u�s�ô%¡®k­o%s��{vI�%�FsAmR~|�Eo}qOx{�ÂvI�ymR~6qIu6m¬x�m|vO�Ê~�sA�

â

Ëpx{�I�%u6sVÈ%�Ck­o%s����%z¦�}sAu�vI�}m�xTz|t%�{s.�L���L�TsKmCx{�?u6sA�L�%u6m�x{rIs�jlk#jnm

ËpxT�F�%u6syô^�g��vI�^�IsAmR~��Eo^qOx{�¹vO�gm8~EqOuEm.x{�b�%vI�%�ñu6sA�L�%u6m�x{rIsnjlk#jlm

m�z�qO�{��ºÊ~�o^siz¬v]w]six�m�¾0»Z¡ Ï �%s?mR~6qOu��ñ��u�sAsn�%vF�]�Gu�sK���%uEmRx{rIsljnk#jx�miw]s�~6sA��~�sAw�¡¦k­o%s|s�Û�~6u�sAz¬s|�LqFmRs¬x�ml��vI�%�^w°x{��~6o%s�jnk#j ��vFusLÛ]�Eo}qO�%�FxT�%�¦�^�^qI�^��x�qO��w^qX~6q¬z|sL�0~6xTvF�%sAw¹x{�bm�sA��~�x{vI�b¾%¡ ¾%¡

Эå�� � á ß[�

Ë}qO�]�GxT�¬vO��qO�¬sL�{sLz|sL�0~g�^qOz|sF�RX0�Xx{mp~6o%s#�LqOuEw]x{�^qO�{xT~8�ivO�^~6o%s#mRsL~� X ì b X ì Zc\ ì qO�}w�X�v]�L�L�%uEm#xT�Ê\ ì�� ¡#Í���sL�{sLz|sL�0~*�^qIz|s?�#xM~6oq���qOu6�Isl�ÓqI�]�ñx{��rXqO�{�%s?x�m*�AqO�{�TsKw þ ° Ú ¡#©�s?t%u6sAm�sL�0~#~6o%s?�ÓqI�]�ñx{�vI��~�o^s¬sA�TsAz|sL�0~6myx{�®à ×Xù9× qO�^w�Ü|ß ù9× x{�°�^�I�%u6s�ɱqI�^w��}�I�%u6sÎ�u6sAm�t�sA�Z~6xTrFsL�{�I¡­©�s¦m��0x{tb~�o^s?u�sKmR�^�M~���vFu�~6o%s¦jlk#jnm*��vIu ×6øIøjlk#jnmn��vFulm�t^qI�Ls¦�LvI�^mR~�uEqOx{�0~A¡�{�qI�Eo��Tx{�%s¬xT��~6o%s¬�^�I�^u�s?u6sLt%�u6sAm�sL�0~6m�q�jnk#j¦¡pÍ£�LqOuEw]x{�^qO���0�^z���sLu�ºÓsL�{sLz|sL�0~��}qOz|s�¼8jy»x�m�qIm6m�xT�F�%sAw�~�v�sAqF�Eo�sA�TsAz|sL�0~��^qOz|sF¡�k­o%sl�ÓqO�]�Gx{�¹rXqI�T�%sKm�qOu6sm�vIu�~�sKw�xT�®w]sKm��LsL�^w]x{�%��vIuEw]sLu¹ºÓqI�T�#�{xT�^sAm|qOu6s±z|vI�^vO~�vF�%x��LqO�{�{�w]sK��u6sAqFmRx{�%�0»�¡�Ë%vFu�jnk#jÑ�#xM~6ob��sL�§sL�{sLz|sL�0~��^qIz¬sKmL�%~�o%si�{xT�^sx�m?m�o%vIu�~A¡�k­o%s|�}�I�%u6s�mRo%vX��ml~6o%s���qOu6�IsAmR~�ÈIñ�ÓqO�%�ñx{��r4qI�T�^sAmA¡Ï �%u?m��%u6rIsA��mRo^vX��ml~�o^qO~?o0�^�^mis�Û]x�m8~?x{�¯jlk#jnmix{�¯qO�{���LqO~�sL��FvIu6xTsKmL¡�¼9�|m�vIz|s#�LqFmRsKmL�Fw]v]���%z|sL�0~Em���vF�F~EqOx{�¦z�qI�0�¦sL�{sLz|sL�0~Em�#o%v0mRs*�^qOz|s*x�mVq?o��%��¡gË^qFm8~­qI�A��sAm6mg~6v�o��%�^m.�LqI���}slx{��o%x{�Iot%u6x{vIu6xM~8�F¡Í*�®vI�^m�sLu6rXqX~�x{vI��x{m¬~6o^qX~�z�qO�����LvI�0~�sA�F~|z|v]w]sL��m�qOu6s�sLÛ��

qF�Z~��{��~6o%s�m�qIz¬sF��q�t%u6s��%Û¯vFu|q�m��]Æ|Û¯vO�yqO�%vI~�o%sAuA¡�k­o%sAm�s�LvI�0~�sA�F~lz|v�w%sL��m*�LqI�^m�s¦m�vIz|s?o%vFu�x{�LvF�F~EqO���{x{�%sAmlqX~lo%xT�Fo��ÓqO�%�x{�br4qI�T�^sAmA¡

ËpxT�F�%u6s?É]�gË^qO�%�ñx{�bvO�p�LvI�0~�sA�0~�z¬v]w]sA�{m­x{�¯à ×OùG× jlk#jnm

ËpxT�F�%u�siÎ^��Ë^qO�%�ñx{��vI�e�LvI�0~�sA�0~­z|v]w]sA�{m­x{��Ü|ß ùG× jlk#jlm

� � µ ´ h ~A·bap¸Lµ ´

Á�vFz|s��n���°m8~6vIuEqO�Fs�mR�]mR~�sLzb�F~8��t}sK�Eo%sA�E��x{�%��qO�}w��0�%sLu6�¦��qO�%��F�^qO�Fs­w%sLt�sL�^w�vF����sAuR~EqOx{��qIm6mR�%z|t]~6xTvF�^mgvF��jnk#jlmL¡g©�s*�LvI�T��{sA��~�sAwbu6sAqI�Cjlk#jnmlqO�^w�qI�^qO�{���LsKw±~6o%s�mR~�u6�^�Z~6�%u6sAm#~6o^qX~nz|q4���s#qIm6mR�%z|sKw�x{�¦�l���bu6sAm�sAqOuE�Eo�¡Ck­o^sVt}qOt�sLugt%u6vXr�x{w]sKmem8~EqX~�x�mR�~6x{�Am­vI��mRvFz|symR~�u6�^��~��%u6sAmVvO�eu6sAqO��jlk#jlmL¡�! �" §�ª$#&%�¥$')(g¥$*r¥%§�©�­�k­o^qO�^��m�~�v,+es�~�sAunûV�%�%sAz�qO��qI�^w

©�sL�]��sAx}Ë}qO�¬��vIue~6o%sLx{u�x{�0rXqI�T�^qI�%�{s���vIz|z|sL�0~�qI�^w¬sL�^�LvI�%uEqO�Fs��z|sL�0~K¡�k­o%s¦qI�]~�o%vFu��VvI�%��w�qO��mRv±�Tx{�Isi~6v�~�o}qO�%�¹z�qO����z|sAz¦���sLuEmyvO�-+esL�%��jlqX~EqO�^qFmRs�Å#sKmRsKqOuE�Eo^Çlu6vI�%t°��vIu¦mR�%�F�IsKm8~6xTvF�^mvF�±~6o%x{m�t^qOt�sLuK¡

. �0/R��dt� ´ hå��a

½TÄ�¿1+e¡�Æ�¡eû.x{u6vI��qO�^w�Í�¡p��qO�{o%vO~6u6q^¡��l���¨Á]�Eo^sLz�q2+gqOu�~Ã%�ejlqX~EqX~8��t�sAmA¡�©¢¾�ãªÅ#sK��vIz|z|sL�}w%qX~6xTvF��¡]Í�rXqOx{�{qI�%�{slqO~354$4�687�9$9;:$:�:=<>:@?A<CBED$F9;G�H@9JILKNM5O�PL3QLKSR5TU �O��q4�¹ÃOõIõ^ÄI¡

½ Ã4¿1V}¡�ûVvFm6qO�D¡ k­o%sW+��{q4�]mÂvO��Á]o^qO�FsAm�t}sKqOu6s¨x{���l����¡Í*r4qIxT��qO�^�TslqX~ 354$4�687�9$9;:$:�:=<XB$R@O$Y�OETBJ6@Q;Z8<CBED$F59P;BJ[5QED@9\@B@OJRJ]^J3@RE]5Q@OL6@Q$RJDQ$U�_�_�<>354LKNM �;VI�%�{��ÄKÇIÇIÇ^¡

½ ¾X¿?ky¡�ûVu6q4�F�`V^¡a+gqOvF�Txñ�¢qO�}w ãn¡¯Á�t}sAu���sLu6�O�9���Lbn�%sLsA��¡{�Û0~6sL�^m�x{�%�Ts®��qIu����%t �CqO�^�I�^qI�IsóºÓ�n���p»�ÄF¡ õ^¡ ©¢¾0ãÅ#sK��vFz¬z|sA�^w%qX~6xTvF��¡DÍ�rXqOx{�{qI�%�{s���u6vIz 354$4E6=7�9$9;:�:$:=<>:@?�<BED$F@9;G$H@9dcLe�e$f59JH$g$hTILKNM$TicLe�e$f�_�U�c�_ �OË^sL��¡�ÄKÇIÇFÎ%¡

È

½ âI¿iÍ�¡Àû.u6�%sA�I�IsAz�qO�%�]��Vy�{sLx{�bqI�^w�j¦¡D©°v�v]wÀ¡ik­o%s�Æ�qI�Tx�w%qO�~6xTvF��vI�gÁAÇy����ã.vI�0~�sA�0~#��v�w%sL��mL��ÄKÇIÇIâ^¡

½ ÈX¿iûl¡�ã.o%vFx=¡�jlk#j÷¼9�^�0�%x�mRxT~�vFu�Ã]¡�Í�rXqIxT��qO�%�{s�qX~ 354$4�687�9$9jE\=<kP5YEOl<>m$6@Q;Z$Z=<XQ�j;mS9@n�]$]SPL3B@YJ9�o$G$odp�U$9 �IÁ�sLt%~A¡}ÃIõIõ%ÄF¡

½ ôO¿iûl¡}ã.o%vFx=¡g©ªo}qX~*qOu6siÅ#sAqI�Cjlk#jnm��CxT�FsI¡�k�sA�Eo%�^x{�AqO��Å#sL�t�vIu�~A�^ÃIõIõ0Ã]¡

½ É4¿iûl¡Cã.o%vFx=�À��¡ÀË%sAu��}qO�^w]sA�I�iV}¡�Á�xTz|sAvI����qI�^w2+e¡D©�qI�{w%sLuK¡ÇyqI�{qOÛ�jnsLz|v^¡VÍ�rXqOx{�{qI�%�{s?qX~ 354$4E6=7�9$9EjJ\=<>\@Q5M�M$TdMJRJ\dOl<PJBLKN9JF5R5M�REI@9 �0ÃOõIõ0Ã]¡

½ ÎO¿�ã.o^u�x�m���x{�{�TsA�§qI�^w j*sKqO�WV0qI�E�]mRvF��¡(Á]�AqO��qO�%�{seÆgsA��~�vFuÇluEqOt^o%x{�Am�º=ÁÄÆÎÇi»Z¡FÍ*rXqOx{�{qI�%�Ts.qX~ 354$4�687�9$9;:$:�:=<>:@?A<CBED�F@9q DRJ6$3�Y�P$OE9�^Jr q 9 �OË%sA��¡}ÃOõFõFÃ%¡

½ ÇO¿iËpx{�^qO�}��x�qO��Á�sLu6r�x{�LsAm�k�sA�Eo%�%vF�TvF�I�Ñã.vI�}mRvFuR~6xT�%zb¡ k­o%sû­qI�%� ¼9�F~6sLu6�%s�~s+gq4��z|sL�0~�Á��]mR~�sAz ºÓûV¼C+.Á%»��?��sKqIw]x{�%�~6o%s®©�q4� ~6v {��TsK�Z~6u�vF�%x{�®ã.vIz|z|sLuE��sF¡ Í�rXqOx{��qO�%�{s�qO~354$4�6=7X9$9;:$:$:8<CtSO�4dPu<XBEDEF@9�65D5B�v$Q5P�4dOJ9�\�Y�6SOE9 ¡

½TÄAõO¿iÅ?¡AÇlvI��w]z�qO��qI�^wwV}¡]©ªx�w]vIzb¡ejnqO~6q�Çl�^x{w]sKmL�¶{��^qO�^�Tx{�%�bn�^sLu6�¬Ë^vIu6z��%��qX~6xTvF��qO�^w Ï t%~�x{z¬x{�AqO~�x{vI��xT��Á]sLz|x{mR~�u6�^���~6�%u6sAw�jlqX~EqO�^qFmRsKmL¡�¼9� ® þ ßyxEz ý à2{C¯ ù ß ý ¯ ×Xù Ý Õ ¯ ×OÔ}|�Õ ¯ ɳ ß ý ßD¯ Ö ß Õ ¯W~}ß ý È ×Oý ÙFß�ú ×OùG×!�*× ��ßC�Z�Ct^qI�IsKmnâ0¾Iô Ø âIâ�È]�ÄKÇIÇ�É�¡

½TÄIÄL¿¢{*¡VÅ?¡­ünqOu6vI��wÀ¡÷û­qIm�sL�^qI�T�yÁ�vI�%uE��s�ã.v]w]s�qI�^w·{�Û%qOz¬�t^�TsKm���u�vFz k­o%s®�n����ûVx{�%�TsF¡ Í�rXqOx{�{qI�%�{s®qO~ 3$4$4�6=79$9;:$:$:8<�Y�\NY�\@M@Y;BA<XB;D$F@9�I;K�M$9EQJIR�K@6SMEQOJ9;\5R@OJQ;\R$M$M$9 �Í*t%uK¡ÀÄAÇIÇFÇ%¡

½TÄKÃX¿iü�¡Cü�v0mRvX�0q�qI�^w�ûl¡�+�x{sLuE��sF¡��njn�^��sF��Í÷k���t�sAw��n���+�u�v]�LsAm6mRx{�%�á�CqI�%�I�^qI�IsF¡ ¼9�c® þ ß�z ý à�{C¯ ù ß ý ¯ ×Xù Ý Õ ¯ ×OÔ� ÕXý�� � þ%ÕEø Õ ¯ ù�þ ß��bß Ú�× ¯DàÊú ×Xù9×IÚE× �LßC�E�.t^qO�FsAm�ÃIÃIô ØÃOâIâ}����q4��ÃOõFõIõ%¡

½TÄA¾O¿iü�¡%ü�v0mRvX�0q%�$V^¡%ÆgvI�^xT�{�TvF���%qO�}w�ûl¡5+�x{sLuE��sF¡¯Å#sL�F�%�{qIu.sLÛ��t^u�sKm�m�xTvF�|~8��t}sKm���vFu.�l����¡p¼9�ê® þ ß�{�¯ ù ß ý ¯ ×Où Ý Õ ¯ ×OÔ)|�Õ ¯ ɳ ß ý ßD¯ Ö ß Õ ¯2��°Ä¯ ÖLù Ý Õ ¯ ×OÔ��Vý�Õ Ù ý�× Ü¦Ü¬Ý=¯0ÙeÒ�{ | � ��Õ �Àt}qO�IsKmÄFÄ Ø ÃIÃ]�]Á�sAt]~A¡}ÃIõIõFõ%¡

½TÄLâI¿y¼8Ëp��Ë%vFu��%zb�.¼9�^�O¡�¼8Ëp�ü�vFz¬sF¡�Í�rXqIxT��qO�%�{sbqX~ 3$4$4�6=79$9;:$:$:8<�Y�t�I5t$BED�mJK�<�BED$F �IÃIõIõ0Ã]¡

½TÄKÈX¿i�¯¡C��qO�FvO~6v^¡|Å {���Í��¤ºÓÅÎ{��I�^�{qIu?�CÍ��^�I�^qI�Is�w]sAm6��u6x{t]�~6xTvF����vFui�n���p»Z¡¦Í*rXqOx{�{qI�%�Ts�qO~ 354$4�687�9$9;:$:�:=<�I;KNM�<�F�D8<vJ6S9JDQ$M�REI@9 �EVF�%�T�¹ÃOõIõFõ%¡

½TÄAôO¿ Ï t�sL�¬Í�t%t^�Tx��LqO~�x{vI�yÇlu�vF�%t�¡ Ï t�sL�¦Í*t%t%�{x{�AqX~6xTvF�_Çlu6vI�%tC¡Í*rXqOx{�{qI�%�Ts qO~ 354$4E6=7�9$9;:�:$:=<XBJ6QJZ@RJ6E6SMY�P;RJ4�YLBJZdOl<BED$F@9 ¡

½TÄ4É4¿�+��0~�o%vF� �l��� Á�t�sA�Lx{qI��¼9�0~6sLu6sAmR~cÇlu�vF�%t�¡ k­o%s�l��� ûVv�vI��z�qOu6� {gÛ%�Eo^qI�%�Is �CqI�%�I�}qO�Is Å#sKmRvF�%uE��s+gqI�IsF¡ Í*rXqOx{�{qI�%�Ts�qO~ 354$4�687�9$9;65��I;KNM�<kO;BJm$DdP;QEt5B;D$F5QA<Z@QE4@9J45BJ6NY�P$OJ9JI�\@Q5ME9 ¡

½TÄAÎO¿yÍ�¡4Á]qIo��%�I�%sL~A¡�{�rIsAu��0~6o%xT�^�=��vI��{�rIsLuC©�qI�F~6sAwy~6vÎVy�%vX�Í*�}vF�]~­jlk#jlmL�%ûV�]~V©°sAu�slÍ#��uEqOx�w�~�v¬Í*m��D¡�¼9�`��ß Ú ú � Éx$�J�J�O�DÃIõIõFõ%¡

½TÄAÇO¿�V}¡^Á�o^qI�%z��%�0qIm��%�^w%qIu6qIzb�%V|¡]k��]�Ê~�sF�Dãn¡S��o}qO�%�}�ÄǬ¡}ü�sI�j¦¡�V}¡?j*sA©ªxM~�~A��qI�^w�V^¡?ËV¡�¡nqO�%�FoF~6vI��¡ Å#sA�{qO~�x{vI�^qI�jlqX~EqO�^qFmRsKm���vFu1bn�^sLu6�0x{�%���n����j*v]�L�%z|sL�0~6mA�l�CxTz|xT~6qO�~6xTvF�^m#qO�^w Ï t%t�vIu�~��%�^xM~6xTsKmL¡e¼9�¸® þ ß1x$� ù�þ {C¯ ù ß ý ¯ ×Xù Ý Õ ¯ ×OÔ|�Õ ¯ ³ ß ý ßC¯ Ö ß Õ ¯�~}ß ý È ×Xý Ù0ß�ú ×Xù9×w�n× �Lß��Z��t^qO�FsAmy¾Iõ0à ؾ^ÄLâ^�ÀÄKÇIÇFÇ%¡

½ ÃOõO¿yj¦¡¹Á0~EqX~6sAm¢qI�^w�+e¡�ÇlvIuEw]vI��¡ k­o%söûV¼ Ï ��� o^vIz|st^qI�IsF¡�Í�rXqOx{�{qI�%�{snqX~ 3$4$4�6=7�9�9;:$:$:=<�\NY;BLKNM�<�PJBLKd9;��p;�;�5�9\NY;BLK�M�<Cj�45j �0��qOuK¡ÀÄAÇFÇIÇ^¡

½ Ã]ÄL¿yü�¡ k­o^vIz|t^m�vI��� j¦¡ ûVsAsA�Eo�� ��¡ ��qO�{vI�^sL�I� qI�^w¡�¡���sA�^w]sL��m�vIo%��¡��l���¢Á]�Eo%sAz�q,+gqIuR~�ÄF��Á0~�u6�^�Z~6�%u6sAmA¡©¢¾�ãöÅ#sK��vFz¬z|sA�^w%qX~6xTvF��¡�Í�rXqIxT��qO�%�{s�qX~ 354$4�687�9$9;:$:�:=<:@?A<XBJD$F@9JG$H9JI;KNMO�PL3@Q�KdR$Tuc �X��q4�¹ÃOõIõ^ÄI¡

½ ÃIÃX¿l©�vIu6��Ö^vX� ��qO�^qI�IsAz¬sA�0~ ã.vFqI�TxT~�x{vI�C¡ ©°vFu���Ö^vX���qI�^qO�FsLz|sL�0~�ã.vFqI�TxT~�x{vI���÷©¢Ë��G�n��� m�t�sA��xT�}�AqX~�x{vI�C¡Í*r4qIxT��qO�^�TslqX~ 354$4�687�9$9;:$:�:=<XB$R@O$Y�OETBJ6@Q;Z8<CBED$F59P;BJ[5QED@9�5�$�$�$�NcL_$R5T5�@M;6$3RA<>354LKNM �XÍ�t%uK¡�ÄKÇIÇIÇ^¡

½ ÃO¾O¿l�l����¡ vIu6�^¡��l����¡ vIu6��Å#sA�Ix�m8~6u��D¡�Í*rXqOx{�{qI�%�Ts�qX~ 354$4�6=79$9;:$:�:=<�I;KNM�<XBED$F@9;I;KNM�9;D5QEF�YJO�4�D��8<Cv@O�6 �IÃOõFõFÃ%¡

ô

ToXgene: An extensible template-based data generator for XML∗

Denilson Barbosa1 Alberto O. Mendelzon1 John Keenleyside2 Kelly Lyons2

1 Department of Computer ScienceUniversity of Toronto

{dmb,mendel }@db.toronto.edu

2 IBM Toronto Lab{keenley,klyons }@ca.ibm.com

Abstract. Synthetic collections of XML documents are useful in many applications, such as benchmarking (e.g., Xmark),and algorithm testing and evaluation. We present ToXgene, a template-based generator for large, consistent collections ofsynthetic XML documents. Templates are annotated XML Schema specifications describing both the structure and the contentof the data to be generated. Our tool was designed to be declarative, and general enough to generate complex XML contentand to capture most common requirements, such as those embodied in current benchmarks. In the paper, we give an overviewof the ToXgene template specification language and the extensibility of our tool; we also report preliminary experiments withToXgene carried out at the IBM Toronto Lab, which show that our tool can closely reproduce the data sets of the TPC-H andXmark benchmarks.

1 IntroductionSynthetic collections of XML documents have many applications in benchmarking and testing algorithms, tools and sys-

tems. Moreover, different applications require documents with different complexities, sizes, etc. For example, a benchmarkfor data-intensive applications might be a large and relatively homogeneous document, with many references among elements,while a test suite for a parser might be an heterogeneous collection with thousand of documents. Given the complexity of writ-ing and/or customizing hard-coded data generators for specific scenarios, we believe a declarative tool for generating syntheticXML documents will prove useful.

We present ToXgene, a template-based tool for generating large, consistent synthetic collections of complex XML docu-ments. ToXgene’s template specification language is a subset of the XML Schema [5] notation augmented with annotationsfor specifying other properties of the intended data, such as probability of occurrences of elements, the vocabulary ofCDATAcontent, etc. This work is part of the ToX project [2], recently started at the University of Toronto.

1.1 Related workA general purpose generator for synthetic XML documents is presented in [1]. Our work differs from that in the following

ways. First, our data generation process is centered on a conceptual description of the desired data (a template). ToXgene givesthe user total control over the data to be generated, unlike the method in [1], where both the structure and the content of thedocuments are randomly generated. We note that our tool allows some controlled randomness in its output and can generatedocuments with irregular structure, as shown in Section 2.4. Second, our tool generates more complex XML content, includingelements with mixed content; attributes; non-gibberish text; and different numerical and date values. Finally, ToXgene supportsdifferent probability distributions.

Another template-based XML data generator is the IBM XML Generator [7]; its inputs are annotated Document TypeDefinitions (DTDs) specifying the structure and the characteristics of the data. There are many limitations to that tool, however.For instance, it allows one tolimit the maximum depth of the document tree, or the number ofID andIDREF attributes inthe documents, but it does not allow one tospecifyactual values for these properties. Moreover, unlike ToXgene, that tooldoes not support different probabilities of occurrence on aper elementbasis, nor the generation ofCDATAcontent of differentdatatypes (e.g., strings, dates, etc.). On the other hand, the IBM generator deals with XML constructs that are not addressed inToXgene, such asENTITY declarations and processing instructions [4].

Furthermore, ToXgene differs from both approaches above in the following ways. First, ToXgene supports element shar-ing, i.e., the values of some elements (or attributes) can be shared among different elements (or attributes) in the same (orin different) XML documents. This allows the generation of collections of corelated documents (i.e., documents that can bejoinedby value). Second, ToXgene can produce data conforming to user-specified integrity constraints, which allows the gen-eration of consistent references within and across documents. Third, ToXgene allows the use of existing data when generatingdocuments; therefore, one can grow an existing collection of documents while maintaining its consistency, or mix real and∗This work is supported by the National Science and Engineering Research Council of Canada, Bell University Laboratories, and the IBM Centre for

Advanced Studies.

1

<tox-distribution name=”c1”type=”exponential” minInclusive=”5”maxInclusive=”100” mean=”35” >

(a) Specification of an exponential proba-bility distribution.

<simpleType name="isbn type"><restriction base="string">

<pattern value="[0-9] {10}"/></restriction>

</simpleType>

<simpleType name="my float"><restriction base="float">

<tox-number tox-distribution=”c1” /></restriction>

</simpleType>

(b) Specification of asimpleType .

<complexType name="my book"><attribute name="isbn" type="isbn type"/><element name="title" type="string">

<tox-string type=”text” maxLength=”30” /></element><element name="author" type="my author"

minOccurs="1" maxOccurs="5"/><element name="price">

<complexType mixed="true"><attribute name="currency">

<simpleType><restriction base="string">

<tox-expr value=”‘CDN‘” /></restriction>

</simpleType></attribute><tox-number minInclusive=”5”maxInclusive=”100” />

</complexType></element>

</complexType>

(c) A complexType specification.

Figure 1: Notation for specifying types, genes and probability distributions in ToXgene.

synthetic data in the generation process (e.g., use real country names as in Xmark). Finally, our tool can be easily extended toallow the generation of domain-specific content, whenever the built-in tools are insufficient.

Annotated XML Schema specifications have also been used for defining relational storage methods for XML docu-ments [3].

The remainder of the paper is organized as follows. Next section gives an overview of the template specification languageand illustrates the kinds of documents ToXgene can produce. Section 3 discusses the extensibility mechanisms of ToXgene.Section 4 reports on experiments carried out with ToXgene. Finally, Section 5 concludes the paper.

2 ToXgene template specification languageAmong the various languages for specifying XML content, we chose XML Schema as the basis for our template language

for two reasons: it is a W3C standard, thus it is expected to become familiar to XML practitioners; and it allows a moredetailed description of XML content than DTDs. In particular, it allows the specification of types, for describing literals (i.e.,CDATAcontent), elements and attributes. However, we note that having an XML Schema specification alone is not enough forgenerating useful synthetic data; at the very least, we need to annotate the schema for specifying which type should be used asthe type for the root element of the documents to be generated.

We also define annotations for other purposes, such as specifying probability distributions of occurrences of elementsand attributes; defining element sharing; defining integrity constraints over XML elements; and providing some controlledrandomness in the structure of the data to be generated. More details on the ToXgene template language can be found in [9].

Types and genes. A typeis a specification of some valid XML content, and can be either asimpleType or acomplex-Type [5]; instancesof a simpleType areCDATAliterals, while instances of acomplexType are either XML elements,CDATAliterals, or a mix of both.

A geneis a specification of either an element (anelementgene) or an attribute (anattributegene), and contains a nameand a type; aninstanceof an element gene with namen and typet is an element whose opening and closing tags are labeledwith n and whose content is an instance oft; similarly, aninstanceof an attribute gene with namen and typet is an attributewhose name isn and whose content is an instance oft. An attribute gene can only be declared within acomplexType , andits type must be asimpleType .

2.1 Specifying probability distributions, types and genesFigure 1 contains the examples we use to illustrate the discussion in this section. All ToXgene-specific annotations are

prefixed by the keywordtox and presented in a different font.

Specifying probability distributions. A probability distribution is declared using thetox-distribution annotation, as shownin Figure 1(a), and referenced via its name, as shown in Figure 1(b). A single template might specify multiple probabilitydistributions, which can be used to determine, for example, the number of occurrence of elements and attributes, the length ofstring literals, and instances of numerical types, as in Figure 1(b). ToXgene currently supports the uniform, normal, exponentialand log-normal distributions, as well as arbitrary discrete distributions, where the user provides all possible outcomes togetherwith their respective probabilities of occurrence. In order to allow the static checking of the consistency of the templates,

2

<tox-list name=”book list” unique=”[book/isbn]” ><element name="b rec" minOccurs="200">

<element name="isbn" type="isbn type"/><element name="author id" type="integer"

maxOccurs="5"><tox-sample path=”[author list/author]”

duplicates=”no” ><tox-expr value=”[id/!]” />

</ tox-sample ></element>

</element></tox-list >

(a) Declaration of a list with 200 book records. Each record has aunique ISBN value and up to 5author id values, sampled (with-out repetition) from a list of authors, declared elsewhere.

<element name="book" minOccurs="200"><complexType>

<tox-scan path=”[book list/b rec]” name=”a” ><attribute name="isbn" type="isbn type">

<tox-expr value=”[isbn/!]” /></attribute><element name="title" type="string"/><attribute name="authors" type="IDREFS"

tox-maxOccurs=”unbounded” ><tox-scan path=”[$a/author id]” >

<tox-expr value=”’author’#[!]” /></ tox-scan >

</attribute></ tox-scan >

</complexType></element>

(b) Specifying queries over lists; the symbol# in the expression corre-sponds to the string concatenation operation.

Figure 2: Defining and querying lists.

we require each probability distribution to have a maximum and a minimum value; all values outside this interval have nullprobability of occurrence.

Specifying simpleType s. A simpleType is a specialization of abasetype, e.g., string, integer, etc.; thedomainof asimpleType is a subset of the domain of the base type it is built upon; and an instance of asimpleType is an elementchosen from its domain. For example, theisbn type , specified in Figure 1(b), is a specialization of the string base type; itsdomain is the set of strings that conform to the given pattern; and1234567890 is one of its instances.simpleType s arethe basic content specification tools for defining genes.

The tox-number , tox-string andtox-date annotations are used to refine aSimpleType definition (e.g., to specify thatinstances ofmy float in Figure 1(b) obey the probability distributionc1, and that the content of instances of thetitleelement in Figure 1(c) are non-gibberish strings1, no more than 30 characters long), or to specify the generation of elementswith mixed content (e.g., theprice element in Figure 1(c)). Constants can be specified using thetox-expr annotation.

SpecifyingcomplexType s and genes. A complexType specification (see Figure 1(c)) contains definitions of elementgenes, attribute genes, or annotations definingCDATAliterals, required for defining elements of mixed content. Both namedand anonymous types are allowed in ToXgene, as shown in the figure. Our tool supports all element content models definedin [4]: character data, as in theisbn element; elements, as in thebook element; and mixed, as in theprice element.

2.2 Specifying element sharing and integrity constraints in ToXgeneToXgene allowselement sharingboth within and across documents; i.e., different elements (or attributes), in the same or

in different documents, can have the sameCDATAcontent. This allows the generation of collections ofcorrelateddocuments(i.e., documents that can bejoinedby value). In our tool, element sharing is achieved by generating all shared content prior togenerating any documents. The shared content is kept in what we calltox-list s; such lists are queried at document generationtime. Integrity constraints over the contents of a list, such as uniqueness of certain values, can be specified, thus ensuring theconsistency of the collections.

Specifying lists. Lists are declared bytox-list annotations. Each list has a unique name, an element gene that defines itscontents, and, optionally, some integrity constraints. Figure 2(a) shows the specification of a list of books, each containingan ISBN and up to 5 references to author elements, stored in another list. The number of elements in a list is determined bythe gene defining it (exactly 200 in the example in the figure). Theunique constraint in the list in Figure 2(a) ensures that noduplicate ISBN values are stored in the list; more complex integrity constraints can be specified usingwhere clauses(see [9]for details).

Querying lists. ToXgene has a language for specifying expressions which are declared usingtox-expr annotations, andare evaluated at content generation time. This language allows arithmetic and string operations; operands can be the resultsof queries, instances ofsimpleType s generated “on-the-fly”, or constants. All expressions are typed; type checking andcasting mechanisms are implemented. Expressions are useful for defining conditional instantiation of genes (see Section 2.4),and for specifying where clauses for lists and selection conditions on cursors (see below). Due to space limitations, we describeexpressions defining queries only; for details, see [9].

The contents of a list are retrieved usingcursorsandqueries, both of which are specified usingpath expressions. A path1Our tool supports two built-in string types:gibberish , andtext , as defined in [10]. However, ToXgene can produce strings using a different vocabulary,

as discussed in Section 4.

3

<tox-document name=”books” copies=”1”<complexType>

<element name="books"><complexType>

<element name="book" type="book"minOccurs="200" maxOccurs="200"><tox-scan path=”[book list/b rec]” >

...</complexType>

</element><complexType>

</ tox-document >

(a) Single file, multiplebook elements.

<tox-document name=”book” copies=”200”<complexType>

<element name="book" type="book"><tox-scan path=”[book list/b rec]” >...

</element><complexType>

</ tox-document >

(b) Collection of files, each having a singlebook ele-ment.

Figure 3: Declaring single documents and collections of documents in ToXgene.

expression is a sequence ofnamesseparated by ’/ ’; a name can be one of: a list name, an element name, a cursor name, or! ,which is shorthand forCDATA. All path expressions are enclosed by square brackets. The elements over which a cursor iteratesare specified by a path expression rooted at a list or at another cursor, and, optionally, a selection condition. ToXgene providescursors for sequential scans (using thetox-scan annotation), and sampling with or without repetition (using thetox-sampleannotation). Different cursors, possibly using different access patterns, might iterate over the same content (thus allowingelement sharing).

A query is an expression that extracts the actualCDATAcontent of the elements in a cursor, and is always evaluated againstthe current element of the cursor it refers to. By default, a query refers to its closest ancestor cursor declaration; an explicitreference to a different cursor can be made by starting the query’s path expressions with that cursor’s name (see Figure 2(b)).

Cursors are always declared within genes, and are updated whenever these genes are instantiated. Thus, the queries definingthe contents of each of the 200 instances of thebook element gene in Figure 2(b) will be evaluated against differentb recelements. Therefore, the output of the template in Figure 2(b) will be 200book elements, each with a distinct value for itsisbn attribute2. Cursors can be nested (see Figure 2(b)); the name of the parent cursor ($a in the example) is used as thestarting node of the path expression defining the nested cursor.

Consider theauthors attribute gene specified in Figure 2(b); its content is obtained by querying theauthor id val-ues of the currentb rec element in the outer cursor. Recall that up to 5 author id’s are generated per book. Thetox-maxOccurs=”unbounded” annotation in the gene specification determines that ToXgene should generate as many instancesof the gene as the number of elements in the cursor defining its content. Therefore, the resulting attribute will be a commaseparated list of author id’s.

The ToXgene query language also defines aggregate and string manipulation functions [9].

2.3 Specifying XML documentsDocuments are specified bytox-document annotations, as shown in Figure 3(a). Similarly to a list, a document specifi-

cation contains a name and a gene, whose instance will be theroot element of the document. Collections of documents arespecified by declaring thecopies attribute in a document declaration (see Figure 3(b)).

ToXgene’s output for the specification in Figure 3(a) is a single XML document, calledbooks.xml , whose root is anelement calledbooks , and whose children are 200book elements copied from a list. ToXgene’s output for the specification inFigure 3(b) is a collection with 200 XML documents, calledbook0.xml , book1.xml , etc.; each document has an elementcalledbook as root, whose contents are copied from the same list. We note that theith book inbooks.xml corresponds tothebook i.xml document in the collection.

2.4 Specifying irregular structures and recursive elementsTemplates can specify control statements that determine, at content generation time, which genes are instantiated. Each

control statement contains one or moreblocksof genes; at processing time,at mostone such block is chosen, and only thegenes in that block are instantiated. ToXgene provides the customary IF-THEN-ELSE statement and also a “lottery” statement,in which a block is randomly chosen according to a probability distribution. Control statements can be nested arbitrarily. Also,the blocks in a control statement might contain completely different genes; note that this provides a way of specifying somecontrolled randomness in the structure of the XML documents produced by ToXgene.

Recursion in element genes. ToXgene also supports the generation of recursive XML content, determined by three param-eters: the total number of (recursive) elements to be generated, the number of children per element, and the number of levelsin the recursion. Any of the supported probability distributions can be used for specifying these parameters.

2Note that ISBN values are unique inbook list .

4

Query sizes ratioToXgene dbgen size time

q14 1 1 1.0000 1.0366q3 10 10 1.0000 1.0972q2 54 44 1.2273 1.2665q16 2789 2762 1.0098 1.2853

(a) TPC-H results.

Query sizes ratioToXgene xmlgen size time

q1 12 12 1.0000 1.0097q11 7661 7661 1.0000 1.0086q12 1358 1436 0.9457 0.9888q17 25595 25535 1.0023 1.0100

(b) Xmark results.

Table 1: Comparison between ToXgene and other data generators. The metric used for part (a) is number of tuples in the resultset; the metric for part (b) is the length (in lines) of Kweelt’s output. All ratios are computed by dividing the measurement withToXgene’s data by the corresponding measurement with the data from other data generator.

3 Extending ToXgeneToXgene is a general purpose tool, and as such provides built-in tools for generating XML content conforming to the most

common datatypes. Although preliminary experimentation with our tool shows it can easily reproduce the synthetic data usedin complex benchmarks, using a combination of queries and string operations, ToXgene was designed to support user-definedCDATAgenerators (e.g., a generator for pseudo-random DNA sequences).

ToXgene was implemented in Java 2, and its architecture was designed in a way that Java classes implementingCDATAgenerators could be added with little effort: all that is required is registering the new code in the Gene Factory module; ofcourse, this new code has to implement the interface defined in our code, which consists of a single method. In principle, thecode that produces instances ofsimpleType s is not supposed to produce XML elements (i.e., strings containing elementtags). However, there is nothing in our code that enforces this behavior. Thus, if one needed XML elements with random tagnames, it would suffice to encapsulate a data generator such as [1] as a newsimpleType in our tool. ToXgene can also becoupled with external tools, since it has the ability of storing and reading lists from files.

4 Experimental resultsOur goal with ToXgene was the generation of “useful” synthetic XML documents. To test our tool, we tried to reproduce

the data produced by the generators of known benchmarks; we chose the generators of TPC-H [10] (dbgen ) and Xmark [8](xmlgen ). We decided to use TPC-H because it is a widely used relational benchmark and because it defines many non-trivial integrity constraints over its database. Xmark was chosen because it was designed specifically for XML, and specifiesa complex document, with different levels of nesting, and thousands of references among its elements.

Experimental setting. For the TPC-H data set, we generated data for a 100Mb relational database (i.e., we used a scalingfactor of 0.1), corresponding to 400Mb of XML data. These documents were loaded into a relational database in DB2 usingIBM’s XML Extender [6]; the schema of this database was identical to the one populated bydbgen . No integrity constraintwas violated by our data. For Xmark, we generated a 100Mb document (i.e., we used a scaling factor of 1). Our documentwas validated against the DTD provided by the benchmark authors without problems.

For each benchmarking data set, we ran both queries from the benchmark workload and somead-hocqueries to test specificdetails of the data generated. We used Kweelt3 to run these queries on the Xmark data sets.

All experiments were run on a 4-way 500 Mhz Pentium III machine with 3Gb of RAM and 18Gb of disk storage. Thedata generation takes about 1 hour for TPC-H, and about 25 minutes for Xmark. The memory required for storing the lists are750Mb and 250Mb of RAM, respectively.

Experimental results. Table 1 shows the results of some of the queries selected from the workloads of the benchmarks. Weshow the size of the result sets of the queries, and the relative times to execute these queries on each data set. We show queriesof different result set sizes, both with fixed (e.g., “find the bestn customers”) or variable result set sizes. As one can see, theresults are comparable.

Thead-hocqueries were designed to compare the distributions of values between the data sets. Table 2 shows the resultof some of these queries. All data values in TPC-H are generated according to uniform distributions; this is not the case forXmark: for instance, the average prices of items in closed auctions obey an exponential distribution, while the number of itemsthat can be bought with a credit card obey a uniform distribution. Again, the results are comparable.

Generating Xmark text. Most of the textual content in Xmark is generated by sampling from the 17000 most commonwords in Shakespeare’s plays, according to their frequency of occurrence. We obtain similar content by loading the list ofwords used byxmlgen into a list, which is sampled accordingly. Sentences are formed by concatenating these words. Wegenerate the final text by copying these sentences intoemph, bold andkeyword elements in a recursive fashion, so that wecan havebold sentences nested withinemph sentences, etc.

3Available athttp://http://kweelt.sourceforge.net/ .

5

Feature Target ToXgene dbgen

average balance on customer accounts 4500 4510.06 4470.50average size of parts 25 25.00 25.00average supply cost of parts 5000 499.74 499.69average tax per item in each order 0.04 0.04 0.04

(a) TPC-H results.

Feature Target ToXgene xmlgen

average price in closed auctions 100.00 100.58 96.83average “happiness” of customers with closed auctions 5.50 5.48 5.43percentage of items in Europe that have “United States” as location 0.75 0.7553 0.7447percentage of items in Asia that can be purchased with credit card 0.50 0.5020 0.4945

(b) Xmark results.

Table 2: Comparison between ToXgene and the other data generators, using thead-hocqueries. The target value on the tables isthe expected value for the corresponding feature, given the probability distribution specified by the benchmark.

5 Conclusions and future workIn this paper we introduced ToXgene: an extensible template-based generator for consistent collections of correlated

synthetic XML documents. We presented an overview of our template specification language; we discussed some of the novelfeatures of our tool, such as element sharing, and described how ToXgene can be extended or coupled with other tools. Finally,we reported on preliminary experiments we conducted with our tool.

Development of ToXgene is continuing. We intend to provide a more general mechanism to allow the generation oftext according to different vocabularies, grammars, and character encoding schemes. This would be of capital importancefor generating testing data for text-intensive applications. We are also working on improving ToXgene’s performance; inparticular, we are interested in exploiting possible parallelism in the data generation.

As future work, we identify the extraction of ToXgene templates from existing documents as an interesting problem.The obvious application of this research would be generating synthetic data closely reproducing the characteristics of realdocuments. This would be especially important when inspecting the data is unfeasible or impossible (e.g., for security reasons).Also, templates capture considerable information about the data they describe, and thus are valuable metadata in themselves.

References[1] A. Aboulnaga, J. F. Naughton, and C. Zhang. Generating synthetic complex-structured XML data. InProceedings of the

Fourth International Workshop on the Web and Databases, pages 79–84, Santa Barbara, CA, USA, May 24-25 2001.

[2] D. Barbosa, A. Barta, A. Mendelzon, G. Mihaila, F. Rizzolo, and P. Rodriguez-Gianolli. ToX - the Toronto XML engine.In Proceedings of the International Workshop on Information Integration on the Web, pages 66–73, Rio de Janeiro,Brazil, April 9-11 2001.

[3] P. Bohannon, J. Freire, P. Roy, and J. Simeon. From XML Schema to Relations: A Cost-based Approach to XML Storage.In Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.

[4] T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible markup language (XML) 1.0 (second edition) -W3C recommendation. Available athttp://www.w3.org/TR/2000/REC-xml-20001006 , October 6 2000.

[5] D. C. Fallside. XML Schema part 0: Primer - W3C candidate recommendation. Available fromhttp://www.w3.org/TR/xmlschema-0/ , October 24 2000.

[6] IBM DB2 Universal Database XML Extender - administration and programming. Available athttp://www-4.ibm.com/software/data/db2/extenders/xmlext , 1999.

[7] IBM XML Generator. Available fromhttp://www.alphaworks.ibm.com/tech/xmlgenerator .

[8] A. R. Schmidt, F. Waas, M. L. Kersten, D. Florescu, I. Manolescu, M. J. Carey, and R. Busse. The XML BenchmarkProject. Technical Report INS-R0103, CWI, Amsterdam, The Netherlands, April 2001.

[9] ToXgene- the ToX XML data generator.http://www.cs.toronto.edu/tox/toxgene , 2001.

[10] Transaction Processing Performance Council.TPC Benchmark H - Decision Support, 1999. Revision 1.3.0.

6