32
— Draft — HyperQueries: Dynamic Distributed Query Processing on the Internet *† Alfons Kemper Christian Wiesner Universit¨ at Passau Lehrstuhl f ¨ ur Informatik 94030 Passau, Germany hlastnamei@db.fmi.uni-passau.de Abstract In this paper we propose both a reference architecture for building scalable and dynamic vir- tual market places and a new framework for dynamic query processing based on so-called HyperQueries which are essentially query evaluation sub-plans “sitting behind” hyperlinks. Architecting an electronic market place as a data warehouse by integrating all the data from all participating enterprises in one centralized repository incurs severe problems. Using HyperQueries, application integration is achieved via dynamic distributed query evaluation plans. The electronic market place serves as an intermediary between clients and providers executing their sub-queries referenced via hyperlinks. The hyperlinks are embedded as at- tribute values within data objects of the intermediary’s database. Retrieving such a virtual object will automatically initiate the execution of the referenced HyperQuery in order to materialize the entire object. All administrative tasks in such a B2B market place can be modeled as e-services and executed by the participants’ local administrators. Thus, sensi- tive data remains under the full control of the data providers. 1 Introduction Electronic market places and virtual enterprises have become very important applications for query processing [Jhi00]. Building a scalable virtual business-to-business (B2B) market place with hundreds or thousands of participating suppliers requires highly flexible, distributed query processing capabilities. Architecting such an electronic market place as a data warehouse by integrating all the data from all participating enterprises in one centralized data repository incurs severe problems: Security and privacy violations: The participants of the market place have to relinquish the control over their data and entrust sensitive information (e.g., pricing conditions) to the market place host. * This work was supported by the German National Research Council (DFG) under Contract Ke 401/7-1 Some excerpts of this work appeared in the conference publication [KW01]. 1

—Draft— HyperQueries: Dynamic Distributed Query Processing ... · HyperQueries: Dynamic Distributed Query Processing on the Internet y ... Schema integration problems: Using the

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

  • — Draft —

    HyperQueries: Dynamic Distributed Query Processingon the Internet∗ †

    Alfons Kemper Christian Wiesner

    Universiẗat PassauLehrstuhl f̈ur Informatik94030 Passau, Germany

    〈lastname〉@db.fmi.uni-passau.de

    Abstract

    In this paper we propose both a reference architecture for building scalable and dynamic vir-tual market places and a new framework for dynamic query processing based on so-calledHyperQuerieswhich are essentially query evaluation sub-plans “sitting behind” hyperlinks.Architecting an electronic market place as a data warehouse by integratingall the data fromall participating enterprises in one centralized repository incurs severe problems. UsingHyperQueries, application integration is achieved via dynamic distributed query evaluationplans. The electronic market place serves as an intermediary between clients and providersexecuting their sub-queries referenced via hyperlinks. The hyperlinks are embedded as at-tribute values within data objects of the intermediary’s database. Retrieving such a virtualobject will automatically initiate the execution of the referenced HyperQuery in order tomaterialize the entire object. All administrative tasks in such a B2B market place can bemodeled as e-services and executed by the participants’ local administrators. Thus, sensi-tive data remains under the full control of the data providers.

    1 Introduction

    Electronic market places and virtual enterprises have become very important applications forquery processing [Jhi00]. Building a scalable virtual business-to-business (B2B) market placewith hundreds or thousands of participating suppliers requires highly flexible, distributed queryprocessing capabilities. Architecting such an electronic market place as a data warehouse byintegratingall the data fromall participating enterprises in one centralized data repository incurssevere problems:

    • Security and privacy violations:The participants of the market place have to relinquishthe control over their data and entrust sensitive information (e.g., pricing conditions) tothe market place host.

    ∗This work was supported by the German National Research Council (DFG) under Contract Ke 401/7-1†Some excerpts of this work appeared in the conference publication [KW01].

    1

  • • Coherence problems:The coherence of highly dynamic data, such as availability andshipping information, may be violated due to outdated materialized data in the marketplace’s data warehouse.

    • Schema integration problems:Using the warehouse approach all relevant data from allparticipants have to be converted `a priori into the same format. Often, it would be easierto leave the data inside the participant’s information systems, e.g., legacy systems, withinthe local sites, and apply particular local wrapper/transformer operations. This way, datais only convertedon demandand the most recent coherent state of the data is returned.

    • Fixed query operators: In a fully integrated (data warehouse-like) electronic marketplace, all information is converted into materialized data. This is often not desirable insuch complex applications like electronic procurement/bidding. For example, in pricingoffers one would like to have vastly different choices:

    – fixed pricing via materialized data

    – operators which calculate the prices based on a multitude of local and global param-eters (identity of the consumer company, availability, local plant utilization, sub-contractor prices, etc.)

    – even human interaction during the processing of such complex e-procurementqueries is desirable. In some participating enterprises the pricing could be doneby a human via an interactive “query operator”.

    A second challenge for virtual market places is the administration and scalability whenevernew suppliers or clients join or exit the market place. As the requirements for these administra-tion tasks change and the participants should be able to manage their data themselves withoutthe help of the administrator of the market place, it is desirable to support this by the underlyingarchitecture and distribute the administration issues to the participants.

    Meeting all these requirements for architecting highly flexible B2B market places, we pro-pose so-calledHyperQueriesfor the dynamic distributed query processing and theQueryFlowsystemas the general framework for all administrative tasks. HyperQueries are essentially queryevaluation sub-plans “sitting behind” hyperlinks. This way the electronic market place can bebuilt as an intermediary between the client (issuing a query) and the providers executing theirsub-queries referenced via hyperlinks. The hyperlinks to the HyperQueries are embedded asattribute values within data objects (i.e., tuples) of the intermediary’s database. Retrieving sucha virtual object automatically initiates the execution of the referenced HyperQuery in order tomaterialize the entire object. Thus, sensitive data can remain under the full control of the dataproviders. Instead of replicating the data at the intermediary, only the hyperlink is embedded.

    In summary, the HyperQuery framework allows to blur the distinction between the alloca-tion schema and the data—as it is found in clear separation in traditional distributed databases.Thereby, the distribution of query execution plans is highly dynamic and based on attributevalues obtained during query processing. In our implementation, called QueryFlow,1 we dis-tinguish between hierarchical and broadcast processing of HyperQueries. In the hierarchical

    1The name of our system was derived fromqueryprocessing and workflowsystems because processing querieswith HyperQueries bears some similarities with processing distributed workflows by routing documents to theappropriate tasks.

    2

  • processing mode the initiator of a HyperQuery is in charge of collecting the processed data.Under broadcast processing the data objects containing hyperlinks are sent to correspondingHyperQueries which will then be in charge of routing the processed objects to the query initia-tor or to further HyperQueries, if the objects contain additional hyperlinks. Additionally, thesystem provides Java-based, extensible e-services for the administration of a market place, e.g.,for registration and update tasks of clients, suppliers, products, and HyperQueries.

    1.1 Related Work

    Distributed database systems have been studied since the late seventies in projects like Sys-tem R∗ [WDH+81], SDD-1 [BGW+81], and Distributed Ingres [Sto85]. Middleware sys-tems such as TSIMMIS [PGGMU95], DISCO [TRV96], and Garlic [HKWY97] have beenused to overcome the heterogeneity faced when data is dispersed across different data sources.[SL90] discusses reference architectures for federated DBMSs from system and schema view-points. The automatic generation of wrappers to extract data from different sources is shownin [CDSS98]. [PAGM96] addresses the problems of fusing information from heterogeneous in-formation sources: removing redundancies and resolving inconsistencies is a challenging task.Cohera [HSC99], based on the economic model of Mariposa [SAL+96], integrates heteroge-neous data sources using replication tools.

    But all these middleware solutions require the administrators to manually install all thenecessary functionality for query processing. Newer architectures such as MOCHA [RMR00]and ObjectGlobe [BKK+01] integrate dispersed data sources and provide the autonomous load-ing of functionality from an external code repository. With the use of continuous queries theNiagaraCQ system [CDTW00] allows users to receive new results when they become avail-able. NiagaraCQ processes a large number of queries efficiently by grouping queries that sharecommon computation. In other lines of work, researchers have tried to “query the Web” us-ing languages like WebSQL [MMM97]; these efforts, however, only support navigational styleof access to Web pages. The approach of Goldman and Widom [GW00] combines the queryfacilities of traditional databases with existing search engines on the Internet whereby the exe-cution of queries is speeded up by parallelizing the web search. In [LSK95] a central mappinginformation of all participating, distributed data sources is queried.

    Stonebraker et.al. [SAHR84] propose a similar approach to our HyperQueries. But theirwork onQuel as a Data Typeis restricted to stored sub-queries in centralized database systems.HyperQueries are also related to pointer-based join methods [SC90] of relational and object-relational database systems, as the hyperlinks can be viewed as pointers to data at remote sites.[KGM91] describes so-called object assembly which influences the order in which objects areread from disk or retrieved from remote servers in a distributed system in order to reduce (diskor network) I/O cost. Object assembly is specifically designed to assemble complex objectsthat are hierarchically composed of sub-objects. [CDSC00] describes the dynamic modifica-tion of XML documents by invoking external applications from XML. But that work embedsthe invoking code directly into the XML document and does not make use of indirection. Fur-thermore that approach is based on XML-RPC [US99] and thus not well suited for mass dataprocessing.

    In the recent past some work was done on self-adaptive query processing, too. The queryprocessing mechanism called Eddies [AH00] is adaptive on a tuple-by-tuple basis as opera-

    3

  • tors in a query plan are continuously reordered. That work was done in the context of River[ADAT +99], which is a shared-nothing parallel DBMS that provides adaptive partitioning ofdata to obtain a better utilization of the connected resources. Tukwila [IFF+99] provides adap-tive execution and scalability for a large number of data sources across intranets and the Internet.Furthermore endeavors have been made in warehouses for XML, such as the Xyleme system[Xyl01]. The Nimble XML data integration system [DHW01] can query both materialized datain a central warehouse and heterogeneous data sources on demand.

    [YP00] describes a reference architecture for interoperable e-commerce applications. Vir-tual enterprises and B2B e-commerce environments present an important application domainfor our new technique: the automobile industry’s electronic market place endeavor “Covisint”[Cov] and SAP’s “mySAP.com”[SAP99] electronic market places are among the well knownexamples. [Bak98] describes the increasing relevance of electronic market places on the Inter-net, [BSZ98] addresses the benefits of component-based systems that consist of a lightweightkernel to which new features can be added.

    1.2 Running Example

    We demonstrate the HyperQuery technique with a scenario of the car manufacturing industry.We assume a hierarchical supply chain of suppliers and sub-contractors. A typical process ofe-procurement to cover unscheduled demands of the production is to query a market place forthese products and to select the incoming offers by price, terms of delivery, available quantity,etc. The price of the needed products can vary by customer/supplier-specific sales discounts,the quantity of materials to be provided, duties, plant utilization, etc. Thus the price cannot bea materialized attribute as in traditional query processing systems. Instead it is an individuallycalculated, dynamically changing attribute and a hyperlink to the supplier is contained, wherethe price will be computed on demand.

    In traditional distributed query processing systems such a query can only be executed if aglobal schema exists or all local databases are replicated at the market place. Considering anenvironment, where hundreds of suppliers participate in a market place, one global query whichintegrates the sub-queries for all participants would be too complex and error-prone, i.e., if onesupplier’s host is down, the whole query execution would fail.

    Following our approach the suppliers have to register their products at the market place,which they want to participate in, and specify, by which sub-plans the price information canbe computed attheir sites. This calculation can be arbitrarily complex and involve their sub-contractors, too. The allocation schema given by the data at the market place is exploited forexecution.

    Figure 1 shows an SQL-like query, that returns the prices and suppliers of all needed prod-ucts. The execution is stopped at the latest at the given value of theexpires attribute. Onlythe results gathered so far are considered. Figure 2 shows two possible execution traces of thisquery—both are supported by our evaluation technique. In the hierarchical execution of Fig-ure 2(a) the resulting objects flow back to the sites, where the original input objects came from,whereas in the broadcast execution of Figure 2(b) the objects do not flow all the way throughintermediates back to the client, but are routed directly to the client, which issued the query.

    4

  • select p.ProductDescription, c.Supplier, c.AdditionalData, c.Pricefrom NeededProducts p, Catalog@MarketPlace cwhere p.ProductDescription = c.ProductDescriptionorder by p.ProductDescription, c.Priceexpires Friday, May 18, 2001 5:00:00 PM CET

    Figure 1: Example Query of the Car Manufacturer

    Supplier 1 Supplier 2 Supplier 3 Supplier 4

    Sub-Contractor 1 Sub-Contractor 2

    Client

    Market Place

    Supplier 1 Supplier 2 Supplier 3 Supplier 4

    Sub-Contractor 1 Sub-Contractor 2

    Client

    Market Place

    (a) Hierarchical Query Processing (b) Broadcast Query Processing

    Figure 2: Flow of Control and Flow of Objects(Dashed lines indicate the flow of control and intermediate results,

    solid lines indicate the flow of result objects)

    1.3 Overview

    The rest of this paper is organized as follows. Section 2 defines HyperQueries. Section 3describes the reference architecture of a scalable dynamic market place with the example im-plementation of our distributed query processor—the QueryFlow system. Section 4 illustratesthe execution of HyperQueries and shows both hierarchical and broadcast processing mode.Section 5 denotes the security issues arising in our HyperQuery framework. In Section 6 wepropose some optimization techniques for HyperQuery execution and give some details of ourimplementation. We show some the scalability of our approach in Section 7 and conclude inSection 8.

    2 Syntax and Semantics of HyperQueries

    Our framework for dynamic distributed query processing is based on virtual attributes whosevalues are determined by evaluating remote sub-queries on demand. In contrast to [SAHR84]we do not store the queries, i.e., the sub-plans to be executed at data providers, within virtual at-tributes. Instead, hyperlinks referencing the sub-plans are embedded as virtual attributes withinthe data. The queries themselves are located at the distributed data providers (remote hosts).We refer to hosts that manage data with virtual attributes as intermediaries.

    In the following we show how hyperlinks and HyperQueries can be incorporated into thedatabase design. We chose XML as data model for all data being exchanged by HyperQueries,as XML is the emerging de facto standard for data exchange on the Internet and (object-) rela-tional data sources can easily be integrated.

    5

  • ::= "hq://""/" "?" ::= = {"&"=}

    Figure 3: URI Schema of theHyperQueryProtocol

    2.1 Metadata for HyperQuery Processing

    We introducevirtual attributesto encapsulate hyperlinks and the results of the correspondingsub-plans as columns in a database table. The hyperlinks are replaced by the result values ofthe HyperQueries. The schema of these result values is specified by the intermediary usingXML Schema [TBMM00, BM00] and is publicly available at the intermediary. Furthermorethe intermediary defines so-calledapplication-specific parameters, e.g., OrderQuantity ,that can be given by the client and can be used within the HyperQueries to calculate the actualvalue of the virtual attributes.

    2.2 Specification of Hyperlinks

    To obtain general locators that are independent of the physical location of the referred sub-plans,we specify hyperlinks by defining a Uniform Resource Identifier (URI) [BLFM98] schema.The EBNF of this schema is shown in Figure 3. The individual components of these URIs havethe following meaning: the leadinghq denotes ourHyperQueryprotocol, is theDNS name of the host, on which the sub-plan is stored and executed, andis the name of the stored sub-plan within a hierarchically structured repository of plans. The and the are referred to as URI prefix. The subsequencingobject-specific parameters are introduced by “?” and are a “&”-separated key-value list. Theseobject-specific parameters represent foreign keys on a virtual document at the remote host andhave to contain enough information, to calculate the actual value of the virtual attribute at theremote site. All entries of the virtual attribute with the same URI prefix must have the sameobject-specific parameter structure; each URI prefix leads to the instantiation of one sub-planat the remote host. It should be noted that each site can host multiple different locally uniquesub-plans.

    Figure 4 shows a simple extension of theCatalog of the electronic market place exampleof Section 1. This extension is presented in XML style, although the market place is not boundto a specific database. The data can be stored in a relational database and XML is generated ondemand as described in [Rys01]. The virtual attributePrice of the first object denotes, that theprice is calculated at the hostsupplier1.com for an object with the key attributeProdIDand valueCB1232 by using the sub-plan namedElectrical/Price . It is apparent, thatan object can hold multiple virtual attributes that refer to different subplans and even differenthosts.

    2.3 Specification of HyperQueries

    A HyperQuery is the counterpart of the hyperlinks of virtual attributes. These query plans areexecuted on remote hosts and may be arbitrarily complex, integrate applications and legacy

    6

    "hq://""/""?"

    ={"&"=}hq

    supplier1.com

  • Battery, 12V 32ASupplier 1hq://supplier1.com/Electrical/Price?ProdID=CD1232

    Battery, 12V 55ASupplier 1hq://supplier1.com/Electrical/Price?ProdID=CD1255

    Tires 175/65TR14Supplier 2hq://supplier2.com/Price?ProdKey=175/65TR14

    Spark Plug VXSupplier 3hq://supplier3.com/PriceForUSA?ID=1234

    Window RegulatorSupplier 4hq://supplier4.com/SpecialPrice?SerialNo=WR4T

    ...

    Figure 4: Catalog at an Electronic Market Place

    systems, and may contain even user interaction. The most comfortable way for stating Hyper-Queries is to use our SQL dialect2. A HyperQuery accesses a virtual document calledHyper-QueryInputStream that serves as a receiver of the input data objects that “flow through”the hyperlinks. Only the object-specific parameters of the corresponding hyperlink and theapplication-specific parameters are accessible by the HyperQuery. Additional attributes of aninput data object cannot be used for processing the HyperQuery; they are passed through. Inour prototype system, alternatively, a HyperQuery can consist of arbitrarily complex Java op-erations which have to implement the iterator interface of [Gra93]. Section 3 discusses this inmore detail.

    Figure 5 shows the SQL formulations of two example HyperQueries, both determiningthe price of the specified products. The HyperQuery at Supplier 1 assumes a data sourceProducts (in the following we assume thatProducts is an XML data source) with at leastthe attributesProdID andPrice . Price is determined by a simple join on theProdIDof the tableProducts and theHyperQueryInputStream . The functionNextXML pro-duces the XML representation of given list of relational attributes, see the details of the inte-gration of XML in Section 3. The second HyperQuery could be executed at Supplier 4. Itis more complex and assumes two tablesBillOfMaterial(SerialNo,ComposedOf)andParts(PartID,Price) . HerePrice is calculated by summing up the prices of thecomponents and multiplying this with the ordered quantity, which is an application-specific pa-rameter. Note thatp.Price andp.ComponentPrice in both HyperQueries can be virtualattributes, i.e., further HyperQueries on other hosts could be executed to compute the value of

    2We plan to support XQuery [CFR+01], too.

    7

  • selecth.*, p.Price as Price, NestXML(p.PricingConditions, p.PNGImage) as AdditionalDatafrom HyperQueryInputStream h, Products pwhereh.ProdID = p.ProdID

    (a)Electrical/Price at Supplier 1

    selecth.*, (h.OrderQuantity *sum(p.ComponentPrice)) as Pricefrom HyperQueryInputStream h, BillOfMaterial b, Parts pwhereh.SerialNo = b.SerialNoand b.ComposedOf = p.PartIDgroup by h.*

    (b) RegularPrice at Supplier 4

    Figure 5: Two Example HyperQueries in Our SQL Dialect

    these attributes.If an object is sent to a HyperQuery, the URI of the virtual attribute is replaced by the actual

    value. This value is calculated from the object-specific parameters that are given by the URI.Other attributes of the object cannot be used for the computation within the HyperQuery andare passed through.

    The type of the actual value of the virtual attribute must coincide with the schema definitiongiven at the intermediary; objects of incompatible type are discarded—see Section 3 on theintegration of additional XML elements. If the type is single-valued and multiple values for thevirtual attribute are computed, multiple values have to be returned. If the type is multi-valued(set-valued), the values are nested.3

    If the value of a virtual attribute cannot be computed by the HyperQuery, the object is notreturned, which reflects the semantics of a natural join. How the value is calculated is left to theremote site. HyperQueries may initiate the instantiation of other HyperQueries when accessingvirtual attributes. The resulting objects (1) may flow back to the initiator of the HyperQueryand be post-processed or (2) are routed directly to the client. The second kind of HyperQueriesis the delegation of the calculation to other sites.

    3 The Architecture of the QueryFlow System

    In this section we propose a reference architecture for building scalable electronic marketplaces. We implemented this architecture in a prototypical system called QueryFlow. We payedspecial attention to rely on standardized (or proposed) protocols such as: XML [BPSMM00]and XML Schema [TBMM00, BM00] for all data being processed, SQL for queryingdata, X.509 certificates [HFPS99] and XML Signature [XML01] for authentication, andHTTP [FGM+99] and SOAP [BEK+00] for exchanging data between multiple hosts. Figure 6depicts the basic components of the system, that can be described as follows:

    • The query processing capabilities of the QueryFlow system are based on theObjectGlobesystem [BKK+01], which is a distributed and open query processor for data sources onthe Internet.

    3For our running example application we assume thatPrice is a single-valued attribute.

    8

    Electrical/PriceRegularPrice

  • MP Services:

    QueryFlowServices

    Register

    Que

    ryFl

    owSe

    rvic

    eMan

    ager

    Inte

    grat

    ion

    XM

    L

    Aut

    hent

    icat

    orH

    yper

    Que

    ry

    Inst

    antia

    tor

    Hyp

    erQ

    uery

    Cor

    eSys

    tem

    HyperQuery Engine(QueryFlow Operators)

    Que

    ryFl

    ow

    Product

    Mes

    sagi

    ng

    Register

    Serv

    ice

    HyperQuery

    SOA

    P

    ObjectGlobe

    Figure 6: Architecture of the QueryFlow System

    • The communication between multiple instances of the QueryFlow system is implementedvia a proprietaryMessaging Service. Additionally SOAPas the emerging standard fortransferring XML data provides an open interface both to integrate other systems and tobe integrated by other systems.

    • The HyperQuery Enginecombines all operators that are specific for HyperQuery pro-cessing, i.e., for resolving the hyperlinks to HyperQueries (which is done by the so-calledDispatch operator), collecting incoming objects, and operators for optimizationof HyperQuery execution, as described in Section 6.

    • TheQueryFlow CoreSystemmanages the instantiated HyperQueries and the distributionof the execution to multiple physical hosts. Furthermore all additional data structuressuch as caches and administrative data of the executed HyperQueries are managed withinthis component.

    • The HyperQuery Instantiatormanages the instantiation of HyperQueries at the remotesites. The HyperQueries are stored in a hierarchical structured repository that can be ontop of the file system, a proprietary database, or a web server.

    • The certificate-basedHyperQuery Authenticatoris used for signing requests and querieswhen sending them to hosts and verifying them at the destination. This component isbased on the security infrastructure of the underlying ObjectGlobe system and is de-scribed in more detail in Section 5.

    • As already mentioned, the QueryFlow system processes XML data. TheXML Integrationcomponent collects all functionality to access XML data sources and to handle semi-structured data.

    • Finally, we implemented an extensibleQueryFlow ServiceManagerfor executing Java-based e-services. These services are required for administrative issues, e.g., for the reg-istration of suppliers, products, and HyperQueries. The services can be accessed via theMessaging Service or the SOAP interface of the QueryFlow system.

    9

  • Having seen this survey of basic components, we explain the most significant parts moredetailed.

    3.1 The Distributed Query Processor ObjectGlobe

    The idea of ObjectGlobe is to create an open market place for three kinds of suppliers:dataproviders supply data,function providersoffer query operators to process data, andcycleprovidersare contracted to execute query operators. Of course, a single site (even a single ma-chine) can comprise all three services, i.e., act as data-, function-, and cycle-provider. Object-Globe enables applications to execute complex queries which involve the execution of operatorsfrom multiple function providers at different sites (cycle providers) and the retrieval of data anddocuments from multiple data sources.

    The whole system is written in Java for two reasons: First, Java is portable, so that oursystem can be installed with very little effort. In particular, hosts need to install the systemand can very easily join a market place by inserting hyperlinks at the intermediary and provid-ing the corresponding sub-plans. Second, Java provides secure extensibility. Like the systemitself, user-defined query operators are written in Java, they are loaded on demand (from func-tion providers, e.g., the market place host or third-party vendors), and they are executed at thecycle providers in their own Java “sandbox” ([Oak98]). A user-defined, application-specificquery operator must implement theopen , next , close , andreopen methods following theiterator model of [Gra93].

    3.2 Query Operators

    As mentioned before, our QueryFlow system provides extensibility. This capability is impor-tant, as each participating site has several alternatives for implementing the HyperQueries. Soquery-plans can perfectly be adapted to the companies’ local systems. The query plans maycontain different kinds of operators which can be characterized by the origin of data.

    Database Queries The simplest kind of local sub-plans are queries as shown in Section 2.The queries are transformed into a tree containing physical operations of the relational algebrawith traditional database operators, e.g., joins, selections, projections, and sorting. Dynamicloading of operators in our QueryFlow system enables the administrator of the local host tointegrate new and more efficient database operations into the query execution. One exampleof such a new database operation is a wrapper that accesses a relational database system usingJDBC. We implemented this easily adaptable wrapper to extract data from commercial DBMSs.This makes the integration of existing databases straightforward.

    Applications If complex business applications, e.g., ERP systems like SAP R/3, spreadsheetanalysis, etc., or legacy systems have to be accessed, wrappers for these applications have to beintegrated into the query plan. This is done the same way as database systems are integrated.All we require, is that the wrappers obey the iterator interface for query operators. The applica-tions are automatically invoked on any incoming data object and the actual value of the virtualattribute is calculated from the current input object. The connection of the QueryFlow system to

    10

  • Figure 7: A Screen Shot of Our Example Operator for Human Interaction

    legacy systems by wrappers means that data is only converted on demand and the most coherentstate of the data is returned.

    Human Interaction Determining the values of virtual attributes in sub-plans can even bedone by human interaction. In this case a user enters the value of a virtual attribute througha Java applet or a GUI. As these operators are executed at the sites of the owner of the data,sensitive data remains under their full control. These operators have two main parts: a serverpart, which implements the iterator interface, is specified in the query execution plan, and runsas a part of the query execution. The corresponding client part acts as an input interface.

    We implemented a prototypical graphical application that can be used for manually enteringthe real value of a virtual attribute. This application has two main parts: a server part, whichis specified in the query execution plan and runs as a part of the query execution, and a clientpart, which acts as an input interface. On instantiation the operator collects all input objects,opens a network socket, awaits a connection from the client and notifies a specified employeevia e-mail. This employee starts the client and processes the data objects to determine thevalue of the virtual attribute and/or approve offers. The client part is the actual GUI, whichconnects to the server via the socket and fetches all objects and requests and the identity of theinitial user, who has initially stated the query. Thus the employee who has to enter the missinginformation has full knowledge of the identity of the initial user and can submit a specific offer.After having processed an input object the employee can send each object separately back tothe server, which passes on the received object to the subsequent iterator. A screen shot of therunning application is shown in Figure 7.

    3.3 Integration of XML Data Sources

    Integrating XML data sources into the QueryFlow systems can be divided into two parts: First,HyperQueries may access XML data sources; second, HyperQueries can produce additional at-tributes that are not initially requested by the user, e.g., additional pricing conditions or imagesare returned with the price, etc. The QueryFlow system copes with these two aspects the fol-lowing way, although the underlying ObjectGlobe system is primarily not designed to processXML data:

    • Accessing XML data sources at a remote host is done by an XML wrapper that is inte-grated into the query-plan of the HyperQuery and transforms all XML data to relational

    11

  • PriceConditions,PNGImage]}{[h.ProdDesc,h.Price,Price,

    ProdID]}

    ...

    ... ...

    ...

    ...

    ...

    {[h.ProdDesc,h.Price, {[ProdID,Price,

    Join

    ExtractURI PriceConditions,PNGImage]}

    XMLWrapper

    Project

    NestXML

    Send

    Price@Supplier1

    Receive {[h.ProdDesc,h.Price]}

    {[h.ProdDesc,h.Price,Price,AdditionalAttributes}]

    Products.xml

    Figure 8: XML Integration in the Query-Plan of the Example Query of Figure 5(a)(The types of the intermediate generated objects are shown in{[...]} )

    representation. This XML wrapper uses the XScan algorithm proposed in [ILW00].

    • The nodes of XML documents are divided intorequired (relational) attributes that areneeded during query processing, e.g., join and selection attributes, andadditionally re-quested attributes, such as comments, images, etc. These attributes are for additionalinformation of the client and are just displayed. Internally, we store required attributesin standard relational representation as tuple attributes, while additionally attributes aremerged together into one singlecontainer attributethat is specified by the intermediaryinitially.

    • As the internal representation of all objects is relational, it has to be payed attention thatdifferent sites do not return objects of different schema to the intermediary. Thus theremote sites have to merge all their additional attributes into one column of the relationalrepresentation before they ship the objects to the intermediary; the required attributes (theactual value of the virtual attribute belongs to) are treated as normal relational attributes.Thus all HyperQueries obeying these conventions, that are defined by the intermediary,return the same schema. Any objects that do not obey this schema are discarded by theintermediary and the sender is informed. This way schema integration is pushed to theremote sites.

    Figure 8 shows a query-plan where both the wrapper for XML data sources and the nestingof additional attributes are used. It should be obvious that the wrapper transforms the objectsof the XML data source into relational representation. Using these tuples the supplier 1 canprocess the whole data in the traditional way. Before sending the resulting objects to the market

    12

  • place the remote site merges the attributesPricingConditions andPNGImage into thecontainer attributeAdditionalAttributes that is defined by the market place.

    3.4 Providing Administrative E-Services

    We propose QueryFlow to be an architecture for building scalable wide-area distributed marketplaces. Among query processing these market places also need the feasibility of administration.These administrative processes are e-services that the market place provides and both clientsand suppliers make use of them.

    One design goal for the integration of services into our framework was extensibility, i.e., ifthe market place provides new services, they should be integrated into the running system andclients and suppliers should be able to access them without any effort. The other big demandwas the seamless integration of XML into the services, i.e., a service takes one input XMLdocument and produces one output XML document containing the result of the service. Theseinput documents contain both the service description and the data that has to be processed bythe service. Thus, we are able to combine different simple services to obtain more complexservices. We met both requirements by a simple interface specification, where each service hasto implement two methods:

    • boolean isApplicable(XMLDocument) checks, if the service is applicable tothe input document.

    • XMLDocument execute(XMLDocument) executes the service with the input doc-ument and produces an output XML document.

    Thus, the QueryFlow ServiceManager can register new services and the Java classes are loadedon demand into their own sandbox. Using XML Signature [XML01] the service request issigned by the client and the QueryFlow ServiceManager can proof the authenticity and integrityof the request.

    Figure 9 shows the execution of an example service: an XML document is sent (1.) to theQueryFlow ServiceManager, where it is parsed and pre-processed (2.). The QueryFlow Ser-viceManager sends the XML document to all registered services and lets decide them, whetherthe service is applicable to the input document (3.). If a service is applicable4 (4.), it is instan-tiated with the XML document and is executed (5.). The produced XML document is returned(6.) as the result to the origin of the service request.

    The current implementation of our example market place provides services for the registra-tion, viewing, and deletion of clients, suppliers, HyperQueries, and products. Furthermore eventhe execution of a query is provided as a service.

    3.5 Communication Issues

    The requisition on the communication in a distributed, highly flexible query processor are (1)transparency, i.e., hiding the physical location of objects, (2) efficiency, i.e., avoiding unneces-sary overhead when processing messages, (3) openness, i.e., anyone can participate the system

    4If no service is applicable, an error document is returned.

    13

  • determineService execute5.

    OK2.

    ...

    RegisterHyperQuery

    RegisterProduct

    RegisterSupplier

    QueryFlowServiceManager1. 6.

    3. i

    sApp

    licab

    le

    4.

    Spark Plug VX

    hq:// ... ... ...

    Figure 9: Executing E-Services in the Market Place Scenario

    just obeying a predefined protocol, and (4) asynchronous mode, i.e., pushing data to the nextprocessing step, when it is ready. To meet these requirements the QueryFlow system distin-guishes between three kinds of communication: internal communication, external communica-tion, and push-based communication for mass data transfer. All three kinds are discussed in thefollowing:

    Internal communication The communication between two cooperating QueryFlow systemsis done by a proprietary and very efficientMessaging Service, where administrative messagesare exchanged. These messages include status messages of ongoing HyperQueries, registrationof QueryFlow operators at theQueryFlow CoreSystem, the update propagation of executiontraces, cache requests, etc. These are internal messages, that cannot be accessed from outsidethe system. As these messages occur frequently, they have to be processed very efficiently.These messages are based on the serialization mechanism of Java.

    External communication QueryFlow is well suited for being an open system for implement-ing a scalable electronic market place. But if a company does not want to install the wholeQueryFlow system, SOAP [BEK+00] can be used for the communication with other softwarecomponents. We chose SOAP which seems to become the standard mechanism for exchang-ing structured and typed information between peers in a decentralized, distributed environmentusing XML. Our QueryFlow systems implements both a SOAP server and a SOAP client.

    The SOAP server is fully integrated into the system, i.e., no additional WWW server has tobe started to process SOAP requests via HTTP. Offering the functionality of a QueryFlow serveras SOAP services to external clients is easily be done by registeringDeploymentDescriptorsbythe administrator of the server. Thus, we registered the internal communication as a service andthis way we can forward all SOAP messages to the system itself. But as SOAP incurs muchprocessing overhead, this is not recommended for internal communication. The SOAP client is ageneric operator that can be integrated into any query-plan and access external SOAP services.For instance, we can send an SMS (viahttp://www.salcentral.com ) containing the(few) resulting objects of a query.

    14

    http://www.salcentral.com

  • Dispatch Dispatch recv

    send

    (a) Nesting (b) Sequencing (c) Inner

    Figure 10: The Three Possible Templates for Sub-Plans

    Push-Based Communication The QueryFlow system is intended to serve as a platform forthe integration of a large number of hosts connected via the Internet. One possibility of com-munication and data exchange would be the Java Remote Method Invocation (RMI) package.But as RMI acts synchronously and each data object causes too much overhead, it is not wellsuited for the processing of large volumes of data. Therefore, our implementation uses a pro-prietary socket-based protocol for the asynchronous communication between sub-plans. Thewell-known (synchronous) iterator concept is insufficient for distributed applications due todelays, whereas the asynchronous, push-based communication perfectly fits the needs of (wide-area distributed) HyperQuery processing: The execution of the sub-plans is decoupled and datais pushed to the site where it is needed, as soon as its local processing is completed.

    4 Executing HyperQueries in our System

    In this section we illustrate the execution of HyperQueries in our QueryFlow system.

    4.1 Templates for Sub-Plans

    Users of the QueryFlow system specify HyperQueries using our SQL dialect as described inSection 2. A query is transformed into an operator tree that is stored as a sub-plan in a localrepository. The three possible templates for sub-plans in our QueryFlow system are illustratedin Figure 10. They implement theCompositepattern of [GHJV95], i.e., basic components canbe assembled to more complex sub-plans by applying construction rules. These templates canbe characterized as follows:

    Nesting Sub-Plans As shown in Figure 10(a) these sub-plans contain aDispatch operatorthat splits one input stream into multiple output streams that serve as input streams for the nestedsub-plans. This operator is the basic operator for processing HyperQueries (on implementationdetails see Section 6). TheUnion (re-)merges the output streams of the nested sub-plans andproduces one output stream. Thus, the flow of objects is totally encapsulated inside a sub-planof this pattern. This pattern is used to build hierarchies of sub-plans. The client query is alwaystransformed into a plan of this kind.

    15

  • Sequencing Sub-Plans Sequencing sub-plans as shown in Figure 10(b) contain the initialDispatch operator that splits one input stream into multiple output streams; no finalUnionis given, so a surrounding sub-plan with aUnion is required to which the objects of the sub-plans can be routed. Thus, objects that are once sent to the next sub-plan are never sent back tothe delegating sub-plan. So the further processing of the data objects is beyond the control ofthe initiating sub-plans.

    Inner Sub-Plans Figure 10(c) shows sub-plans that have one input stream and one outputstream. These sub-plans form the innermost parts of the query execution where the actualvalues of virtual attributes are determined. As already mentioned above multiple output objectsmay be generated for one input object.

    4.2 Processing Hyperlinks

    In our QueryFlow system the processing of hyperlinks is done by an operator calledDispatch , that splits one input stream into multiple output streams. This splitting is based onthe value of the virtual attribute. If the actual value of the virtual attribute is already material-ized, theDispatch operator returns this value5. But if a hyperlink is encountered, the valueof the virtual attribute is computed by “following” the hyperlink according to this procedure:

    1. The the remote host and the identifier of the sub-plan is determined from the hyperlink.

    2. If the referenced sub-plan has not yet been instantiated at the remote host, an instantiationrequest is sent. This request contains additional data that can are used at the remote hostto parameterize the sub-plan. These “global parameters” stem from external data sources,such as the environment of the client, e.g., the preferred currency of the client, the countryof the client, etc.6. Keeping track of all instantiated sub-plans avoids multiple instancesof the same sub-plan at remote hosts.

    3. Once the sub-plan has been instantiated all data objects with the same URI prefix arerouted to it, whereby whole data objects, i.e, the hyperlinks and the additional elementsof the input object, are transferred. This is usual in traditional databases and compliantwith the iterator model, but different from hypertext processing, where only the requestis sent and the resulting information is returned to the client. In Section 6 we discuss thesplitting of objects and sending only the necessary parts of the object.

    4.3 Processing Simple Queries

    On the basis of the example market place of Section 1, the extension of Figure 4, and theexample query shown in Figure 1 we illustrate the process of incremental plan generation andplan execution. Figure 11 shows the XML fileNeededProducts.xml at the client.

    Due to clarity we substituted in Figure 12 the concrete data objects by symbols, whereand denote the two battery objects, denotes the tires object, and denotes the spark

    5For the remainder of this paper we assume that virtual attributes contain no materialized values.6In our market place example these “global parameters” are taken from the registry information of the clients.

    16

  • Battery, 12V 32A500

    Battery, 12V 55A

    750

    Tires 175/65TR141000

    Spark Plug VX8000

    Figure 11: Some Needed Products of the Car Manufacturer

    plug object and requested no additional attributes. Figure 12(a) shows the start of the queryexecution: The user-stated plan is instantiated with a scan of theNeededProducts.xmlfile at the client. The attributesPrice andSupplier of theCatalog document are joined(indicated by1;) to the input objects. The vertical hatch indicates the enriched objects in thefollowing figures.

    As Price is a virtual attribute, aDispatch operator splits the resulting streamof objects into multiple output streams. In Figure 12(b) the first object for Supplier 1passes theDispatch operator which sends an instantiation request7 for the sub-planElectrical/Price to Supplier 1. Basically this sub-plan consists of a join with a localdocument as shown in the HyperQuery of Figure 5(a). All objects that belong to this sub-planare routed to it by theDispatch operator (Figure 12(c)/(d)). Figure 12(d) also shows theprocessing of the

    ���������

    ���������

    object at the market place. As its pricing information is calculated at Sup-plier 2, the instantiation of the corresponding sub-plan is requested. This sub-plan involves acomplex application to calculate the price of the input objects. Concurrently the price has beenadded to the

    ������������ object at Supplier 1 and the resulting object8 can be forwarded to the final

    Union . So an additional input stream is requested at theUnion .Having registered the new input stream at theUnion , the object is sent to the market

    place. The price is inserted into the next data object������������ and generated a object (Figure 12(e)).

    The���������

    ���������

    object is routed to Supplier 2. The market place requests the instantiation of the sub-plan namedPriceForUSA for the last input object

    ������������ at Supplier 3. In this sub-plan a human

    user enters the pricing information, e.g., using a GUI. In Figure 12(f) the������������ object is routed

    to Supplier 3. Supplier 2 has inserted the pricing information into its���������

    ���������

    object and generateda object, which is sent to its target. So a further input stream is requested at theUnion .Supplier 1 routes its object to theUnion .

    Supplier 2 sent its object to theUnion and Supplier 3 has inserted the pricing informationfor its

    ���������������� object and generated a object (Figure 12(g)). The routing of this data object leads

    to a request of an additional input stream at theUnion . Figure 12(h) depicts the result, where7Note, that all sub-plans are instantiated only once for a query.8Objects with fully materialized virtual attributes are visualized in solid black.

    17

  • 1;Market Place

    Client

    1;

    ������������

    Market Place

    Client

    Supplier 1

    1;

    ������������

    ������������

    Market Place

    Client

    Supplier 1

    1;

    ���������

    ������������

    ���������

    App

    Market Place

    Client

    Supplier 1 Supplier 2

    (a) (b) (c) (d)

    ������������

    ������������

    ������������

    App

    1;Market Place

    Supplier 1

    Client

    Supplier 2 Supplier 3

    ������������

    App

    1;Market Place

    Client

    Supplier 1 Supplier 2 Supplier 3

    1;

    App

    Client

    Supplier 1

    Market Place

    Supplier 2 Supplier 3

    1;

    App

    Client

    Supplier 1

    Market Place

    Supplier 2 Supplier 3

    (e) (f) (g) (h)

    Figure 12: Routing of Objects & Instantiation of Sub-Plans(solid lines indicate the routing of objects, dashed lines indicate the instantiation of sub-plans)

    the actual value of all input objects has been inserted and the resulting objects have reached theUnion . Based on these data objects the query is processed further, i.e., the sorting is done.After the complete execution of the query, all (intermediate) instantiated sub-plans and theuser-stated query plan are closed and cleaned up in reverse order of instantiation. In this simpleexample each input object of a sub-plan generates one output object. But it is also possible thatmultiple output objects are produced for one input object.

    4.4 Processing Multi-Level HyperQueries

    So far we have demonstrated the incremental instantiation of sub-plans for simple one-levelHyperQuery processing. The HyperQuery concept is, of course, not restricted to one level.While processing a HyperQuery, other hyperlinks may be encountered which initiate nestedHyperQueries. Figure 13 illustrates our component-based QueryFlow system on more complexexample applications. We only show the complete query plans (after all sub-plans have beeninstantiated) and omit both concrete data objects and the sequences of the stepwise instantiationof the sub-plans.

    4.4.1 Hierarchical HyperQuery Execution

    If a remote host encounters a virtual attribute that is needed for the further execution of theHyperQuery, the host acts as intermediary and initiates a nested sub-query at another remotehost using the pattern of Figure 10(a). After pre-processing the data objects, they flow fromthe surrounding sub-plan to the nested sub-plans, where the value of the virtual attribute is

    18

  • ∪∪

    Client

    Sub-Contractor 1

    Market Place

    Sub-Contractor 2

    Supplier 1 Supplier 4

    Market Place

    Sub-Contractor 1 Sub-Contractor 2

    Client

    Supplier 4Supplier 1

    (a) Hierarchical HyperQuery Execution (b) Broadcast HyperQuery Execution

    Figure 13: Kinds of HyperQuery Execution

    computed. Then the complete objects are sent back to the surrounding sub-plan, where theyare processed further. Figure 13(a) shows an example. Supplier 4 executes the HyperQuery ofFigure 5(b) and accesses the virtual attributeComponentPrice . Thus, the right hand sub-plan instantiates further nested sub-plans at sub-contractors. Note, that the virtual attributes atthe levels of the nesting need not be the same, e.g., the outer virtual attribute could bePrice ,while the inner isComponentPrice .

    4.4.2 Broadcast HyperQuery Execution

    If a hyperlink is encountered within a HyperQuery and the resulting objects need not be pro-cessed any further, the evaluation can be delegated to other HyperQueries. Using sub-plans ofthe pattern of Figure 10(b) data objects are (after a pre-processing step) forwarded to the se-quencing sub-plans. It is the task of the further sub-plans to determine the value of the virtualattribute and to send the resulting objects back to the initiator of the query. The prerequisite isthat the virtual attributes are the same for both levels of HyperQuery execution. TheUnion ofthe surrounding sub-plan merges the results of the broadcast-like inner sub-plans. Figure 13(b)shows an example for the broadcast execution, where Supplier 4 has two subsidiary compa-nies, each one specialized at some goods, and forwards the received objects to them, withoutpost-processing the results.

    The main advantage of broadcast processing is the quick forwarding of data objects withoutthe need of handling them again at the delegating site. The tradeoffs are (1) that the virtualattributes must coincide in all sequencing sub-plans and (2) that many connections, have toregister at the mergingUnion . We want to emphasize that it is a local decision of each partic-ipating site, what kind of sub-plan is executed, and this decision is not affected by other sites.Thus, it is possible to have both hierarchical and broadcast execution of HyperQueries withinthe execution of one query.

    5 Security Issues

    Safety is one of the crucial issues in an open and distributed query processing system. OurQueryFlow system provides a security system for

    19

  • • privacy, i.e., denying unauthorized sites access to sensitive data,• integrity, i.e., denying unauthorized sites the modification of sensitive data,• authentication, i.e., verifying the identity of a user, and• authorization, i.e., verifying, if a user has the permission to execute a sub-plan or an

    operation.

    Furthermore it is essential for e-commerce applications that no partner can deny the involve-ment in a transaction; this aspect is known as non-repudiation. In the following we describethe security system whereby we use the notation derived from [Sch96] for the cryptographicfunctions. Without loss of generality, there are three communication partners: (1) the userU ,who states the initial query, (2) the intermediary, e.g., the market placeM , that executes aDispatch operator and sends instantiation requests to other hosts, and (3) a remote host, inour example scenario a supplierS, that executes a HyperQuery.

    Privacy & Integrity The communication streams between servers of the QueryFlow systemare protected using the well-established secure communication standards SSL (Secure Sock-ets Layer) [FKK96] and/or TLS (Transport Layer Security) [DA99, TLS] for encrypting andauthenticating (digitally signing) messages. Both protocols can carry out the authentication ofcommunication partners via X.509 certificates [HFPS99], thus ensuring communication withthe desired QueryFlow server.

    But as communication between business partners not always takes place the direct way, i.e.,communication is done via intermediaries (the market place or suppliers and sub-contractors),the use of these secure protocols is not sufficient. The remote host that computes the actualvalue of the virtual attribute has to ensure that only the proper receiver has access to the dataand intermediaries have to be withheld from access to the data. The following procedure (takenfrom [Sch96]) assures privacy and integrity in the granularity of transmitted objects in multi-level HyperQuery execution:

    • The remote host (the supplier)S generates with its private key the signatureSS(T ) forthe sensitive data of the objectO and appends the signature to the data:(O, SS(O)).

    • The supplier encrypts the data and the signature with the user’s public-key certificate:EU(O, SS(O)). Now the data can be sent to the user.

    • Having received the encrypted and signed message, the userU decrypts the message withhis/her private key:DU(EU (O, SS(O))) = (O, SS(O)).

    • The user verifies with the supplier’s public-key certificate the authenticity of the data:VS(O, SS(O)) = O.

    Authentication Knowing the identity of a business partner is a crucial point especially in e-commerce applications. In our market place example authentication is used for two purposes:First, the carrier of the market place has to know for its accounting who states queries. Therequest of an offer may incur costs which have to be shared among the participants. Second,

    20

  • User-statedplan (P )

    SU

    (P)

    SM

    (IS)

    SU

    (P)

    SM

    (P)

    SS(R

    U)

    SM

    (P)

    SU

    (P)

    SS(P

    )

    User-statedplan (P )

    User-statedplan (P )

    Instantiation-request (IS)

    (Subplan)

    Registration-

    (User-stated Plan)

    request (RU )

    User Market Place

    + +

    Supplier

    QueryID QueryID QueryID QueryID QueryID

    Market Place

    Figure 14: Chain of Signatures during HyperQuery Execution

    the suppliers have to know the identity of their clients due to special pricing conditions andcontracting.

    Authentication methods are based on passwords, X.509 certificates, or special protocolssuch as Kerberos [SNS88]. We restrict the further discussion to certificates, i.e., each partici-pant possesses a certificate with the appropriate private key. The certificate contains the publickey that has been signed by an approved certification authority (CA). These digital signaturesserve in e-commerce applications as a legal instrument for binding offers. The traditional ap-proach of authentication in distributed databases would be to sign the query-plan with the pri-vate key of the user. The signature of a query arriving at a provider is verified using the user’scertificate and thereafter the originator and the integrity of the query is known reliably. Butthis technique cannot be applied for HyperQuery execution as the whole plan cannot be de-termined at the user’s client and participants communicate through intermediaries (such as themarket place). The following extended authentication method has been devised for the contextof (multi-level) HyperQuery execution. Our approach is based on handing over the user-statedquery and signing it multiple times.

    Figure 14 illustrates the algorithm with the instantiation of a subplan at supplierS. TheHyperQuery execution starts at the user’s host, where the user-stated planP , that contains aquery identifier, is signed with the user’s private key. The generated signature is appended toP ,leading to(P, SU(P )). Each time aDispatch operator sends an instantiation requestI to a re-mote host bothP andI are signed with the private key of the host that executes theDispatchoperator, generating(P, SU(P ), SM(P )) and(I, SM(I)). Thus, in multi-level HyperQuery ex-ecution multiple signatures are appended to the user-stated plan. As the instantiation requestalso contains the query identifier ofP both parts are marked to belong together and a separationby a malicious third party host is detected by the receiving host. A registration request for theUnion operator at the surrounding sub-plan is treated the same way. Based on this informationthe remote host authenticates both the intermediary and the userU by the following three steps:

    1. The sending host is determined by the use of SSL for directly secure authenticated com-munication. Both received parts of the request have to be signed lastly by this host, i.e.,the user-stated planVM(P, SU(P ), SM(P )) = (P, SU(P )) and the instantiation requestVM(I, SM(I)) = I.

    2. Then the first signature of the user-stated plan is checked with the public-key certificateof the user:VU(P, SU(P )) = P .

    21

  • 3. The next check is done on the query identifier, that must coincide both in the user-statedplan and the instantiation request.

    On success the authentication process is finished and both the userU and the intermediary, i.e.,the market placeM , are known reliably.

    Authorization The proper authentication of the participants is used by the providers to en-force their local authorization policy autonomously. We use a role-based access control (RBAC)model [SCFY96] to specify authorization rules. RBAC distinguishes between users, roles whichare assigned to users, and permissions which are assigned to roles. Using this access controlmodel each host establishes its own rules, that declare which sub-plans may be instantiated bywhich users.

    In our market place scenario, a supplier could have an exclusive agreement with a certaincustomer. This policy can be implemented by using a certain sub-plan that is just used for thiskind of goods. The supplier stores in the RBAC information that only the “exclusive” customerhas the permission to execute this certain sub-plan. On the instantiation of the sub-plan theidentity of the customer is checked and, if authorized, the sub-plan is instantiated; instantiationrequests from unauthorized customers are refused.

    Non-Repudiation It is essential for e-commerce applications that no partner can deny theinvolvement in a transaction. Following the traditional data warehouse approach this would bedifficult, as the market place cannot legally sign the offers of the suppliers. Using HyperQueries,the market place acts as an intermediary and the offers are processed by the suppliers and canbe legally signed by themselves. As it is not clear, if an automatically processed and signeddocument is legal, we can use the iterator for human input, that can sign the produced objectson demand. This way the declaration of intention is expressed explicitly.

    6 Optimization & Implementation Details

    So far we have demonstrated the basic techniques for the evaluation of HyperQueries. Nowwe discuss optimization approaches that can be combined arbitrarily and some details of ourimplementation.

    Bypassing of Attributes If “bulky” attributes, such as images or product descriptions, arerequested in the result, but are not needed for the calculation of the virtual attribute at a remotehost, they can be projected out when passing theDispatch operator. These attributes aresent to the finalUnion and are re-merged to the resulting objects after the virtual attribute hasbeen calculated. Especially in multi-level HyperQuery execution this decreases the amount ofdata transferred over the network and reduces the execution time in slow and bursty networks.During stripping off the bulky attributes a unique sequence number is added both to the bulkobjects and the remaining data objects. Using these sequence numbers the bulk objects can bemerged to the corresponding data objects at theUnion operator. This optimization method issimilar to bulk bypassing ([BCKK00, CKKW00]) in central databases. Figure 15(a) illustratesthe bypassing of bulk objects around the sub-plans in the market place scenario. Bypassing

    22

  • Bulk

    Market Place

    Supplier 1

    Client

    Supplier 2

    σ

    σσ

    Market Place

    Supplier 2Supplier 1

    Client

    Cache

    (3)insert

    (2)found

    (4) retrieve matches

    (1) lookup/insert

    Client

    Supplier 1

    Market Place

    Supplier 2

    Cache

    Client

    Supplier 1 Supplier 2

    Market Place

    (a) Bulk Bypassing (b) Predicate Migration (c) Intra-Query Caching (d) Inter-Query Cachingat the Market Place at Supplier 2

    Figure 15: Illustrating Optimization Techniques

    attributes can also be applied when sensitive information has to be kept back of the remotehosts without expensive encryption of the data.

    Predicate Migration Predicates on virtual attributes of the user-stated query cannot be eval-uated before the actual value has been computed. To reduce the amount of transferred data,these predicates can be pushed from the user-stated query “into” the HyperQueries at the re-mote hosts. Thus, only relevant data objects are returned to the intermediary or client. Addingthe selection predicatec.Price< 1500to the query of Figure 1 we show how predicate migra-tion works: Without optimization the selection is applied late and all products are sent; pushingthe selection down to the HyperQueries, less objects are transferred. The implementation ofthis optimization is straightforward: The selection predicate is sent to the remote site during theinstantiation of the sub-plan. When passing objects through theSend operator, the selectionis performed. Additional profit can be drawn, if the remote hosts incorporate the possibility ofpredicate migration into their HyperQueries, e.g., the remote hosts can place aSelectionoperator into their sub-plans, whose predicate is set during the instantiation. Figure 15(b) illus-trates the migration of predicates.

    Caching at the Intermediary and at the Remote Hosts Due to duplicates (which may beproduced by preceding joins) the same virtual attributes have to be evaluated multiple times.This can be avoided by caching. We show how intra-query caching works for hierarchicalqueries, i.e., theDispatch and theUnion are executed at the same site. The evaluation ofvirtual attributes is similar to the invocation of expensive methods, but in contrast to [HN96] thisis done asynchronously, i.e., objects are sent to sub-plans, before the results of previous objectsare returned.9 Thus it is not sufficient to store only the returned values. We also have to keepbook of objects that were sent to sub-plans and have not yet produced a result. Figure 15(c)depicts the hash table based caching of virtual attributes. On any input object theDispatchoperator probes the hash table (1). A cache hit is directly sent to theUnion , bypassing theHyperQueries (2). Otherwise the object is inserted into the hash table as a request. If it was thefirst request for this URI, the object is sent to the corresponding HyperQuery. If a result froma HyperQuery is received by theUnion , it is inserted into the hash table (3) and the pending

    9For the same reason sorting on the URI does not work well in HyperQuery processing, as suggested forexpensive predicates in [SAC+79], as this would be the worst case for the asynchronous approach.

    23

  • objects with the same URI are returned (4). If assuming that the results are highly dynamic andfor coherence reasons cannot be re-used in another query, the hash table has to be discardedwhen the query execution has finished. But if the remote hosts give runs of validity for theirresults, this approach can be extended to inter-query caching, where results are cached untilexpiry.

    While this caching takes place at the intermediary, another kind of caching can be used bythe remote hosts to reduce the processing overhead at their sites. This can be achieved, if thesupplier stores for each requested URI the resulting object in a hash table and returns on anyfurther request of the same URI the stored object. This way the execution of the HyperQueriesis shortcut. This is especially in multi-level HyperQuery execution very effective. Figure 15(d)shows this caching technique for Supplier 2. This method can be used for inter-query caching.

    Multiple Virtual Attributes If a query requests multiple virtual attributes the na¨ıve executionstrategy would request at first the value of the first virtual attribute, then that of the second virtualattribute, etc. If all virtual attributes of an object are evaluated at the same site, the requests caneasily be bundled. A plan is generated that contains oneDispatch operator for all virtualattributes whose evaluation can be combined. During the execution theDispatch operatorsends the list of all requested virtual attributes with the instantiation request for one remotesub-plan. When an object passes theDispatch operator, it is routed to the sub-plan, wherethe actual values of all virtual attributes are determined at once. This avoids sending one objectmultiple times to the same host. If not all virtual attributes can be evaluated at the same site,e.g., if the price and the rating by an independent organization are requested, the calculation canbe parallelized as follows: TheDispatch operator instantiates multiple remote sub-plans, onefor each new URI prefix of the virtual attributes to be resolved, and sendsoneinput object with aunique sequence number toall its corresponding sub-plans. TheUnion re-merges the resultingdata objects of different sub-plans using the sequence number. Objects are passed to the nextoperator, when all virtual attributes have been inserted by theUnion .

    Operators for HyperQuery Processing The most significant operator for processing Hyper-Queries is theDispatch operator which splits the input stream into multiple output streams.As the number of resulting output streams cannot be determined `a priori, we have to fork oneDispatch operator for each new output stream. One designatedDispatch operator acts asthe coordinator for the other forked operators and keeps book of them. EachDispatch oper-ator runs in a separate thread; allDispatch operators share one common input stream, fromwhich eachDispatch operator selects its relevant objects. Thus we obtain the concurrent andindependent routing of objects to the instantiated sub-plans on all participating hosts.

    Furthermore, it should be noted, that all special operators for HyperQuery processing, e.g.,the asynchronousSend andReceive , theDispatch , and theUnion operator, are pipelin-ing operators and so perfectly suited for query processing on the Internet and processing stream-ing data.

    Long-Durating HyperQueries The query of Figure 1 shows, that queries may run for a longtime, as the calculation of the prices can last for some time—just as in real-world public bid-ding processes. To limit the duration of queries, we introduced theexpires -attribute of the

    24

  • top-level XML tag, which allows us to give a time-to-live for any stated query. Each instanti-ated sub-plan is annotated with this timeout and is monitored on expiry. If the timeout elapses,the local sub-plans are aborted and only the objects gathered so far are considered for the (ap-proximate) result. As the timeout can amount to days or weeks, the impact of long-runningqueries onto the network communication has to be taken into account: Due to network failureand TCP/IP timeouts open network connections are error-prone. If a connection has not trans-ferred data for a certain time but is still active, its socket is closed temporarily, without affectingthe query execution itself. TheSend andReceive operators exchangekeep-state andresume-connection messages upon temporarily closing and re-opening the communica-tion link.

    Fault Tolerance So far we have assumed that the instantiation of a sub-plan at a remote hostsucceeds. But if a remote host does not respond to any request, either a network failure occurredor the host (or the database server) is down. As this could only be a short-term break and theprocessing of a query lasts longer, we store all data objects that belong to the failure host in atemporary file and periodically re-try to connect to the host. If the remaining query has finishedbefore the host responds, it is no longer waited for, the temporary file is deleted, and an e-mailinforms the client about the incomplete result. If the host is accessible again while the remainingquery runs, the sub-plan is instantiated at the host, the temporary file is sequentially read, andthe stored objects are sent to the remote host where query processing continues as regular. Ifthe requested sub-plan on the remote host does not exist, the remote host sends asubplan-unknown message to theDispatch operator which closes the corresponding output streamand does not send any data objects to the remote host. Further the client which stated the queryand the administrator of the remote host are informed via e-mail.

    7 Performance Investigation

    In this section we present a few benchmark results obtained from our QueryFlow system. Inparticular, we concentrate on investigating the scalability of our approach in a distributed envi-ronment and show the effectiveness of the combination of multipleDispatch operators.

    7.1 Experimental Environment

    Our test scenario constitutes a market place with 26 suppliers. The data for our databases wastaken from the well-known TPC-D [TPC99] benchmark suite of scale factor 1.0. To suit ourlimited benchmark environment, we converted the 10000 suppliers round robin bySUPPKEYto only 26 suppliers, each being allocated to an individual host. ThePARTSUPPtable rep-resented the market place and thePART table was partitioned horizontally to obtain severalPART@SUPPi tables that contained those parts that the supplieri produced. Thus, each supplieroffered approximately 30000 parts, whereby each part was produced by 4 suppliers which leadto 800000 entries at the market place and 200000 distinct parts.SUPPLYCOSTandAVAILQTYbecame virtual attributes. The databases were stored in ObjectGlobe proprietary partitions ontop of the file system. Each participant of the market place ran its database server on a separate

    25

  • selectp.PARTKEY, p.SUPPKEY,p.COMMENT, ’[ virtual attrs ]’

    from PARTSUPP pwhere p.PARTKEY< ’[ sel ]’

    selecth.*, p.RETAILPRICE as SUPPLYCOSTfrom HyperQueryInputStream h,

    PART@SUPPi pwhere h.PARTKEY = p.PARTKEY

    (a) The Client’s Query (b) The HyperQuery at Supplieri

    Figure 16: Queries used for Performance Investigation

    host, whereby the market place was placed on a Sun Enterprise 450 with four 400 MHz Ultra-Sparc II processors and 4 GByte memory. The other database servers ran on machines of typeSun Ultra 10 with 1 UltraSparc IIi processor at 333 MHz and 128 MByte memory. All hostswere in the same 100 MBit LAN, running Solaris 2.7 and using Sun’s JDK 1.3. The securitycomponent was deactivated, as it was not our intent to measure the overhead for decryption,encryption, and authentication. Instead we wanted to get results for the pure query processingtime. All HyperQueries executed at the remote hosts were “expensive” queries, as we wantedto simulate multi-user environment at the suppliers. Furthermore access to legacy systems orapplications would also be slow. The query stated at the client and the HyperQuery executed atsupplieri are shown in Figure 16.

    7.2 Scalability of HyperQuery Processing

    Measuring the scalability of our HyperQuery processing technique is divided into three parts,where we determine the behavior of the QueryFlow system under a growing number ofsuppliers, clients, and requests. In these experiments we queried only one virtual attributeSUPPLYCOST.

    7.2.1 Scaling the Number of Suppliers

    In the first set of benchmarks we determined the behavior of the system under a growing numberof suppliers.

    Requesting each Supplier 150/1500/4500 ObjectsWe varied the number of requested sup-pliers from 1 up to 26 and requested each supplier 150/1500/4500 objects. Thus we got anincreasing number of result objects. Figure 17(a) shows the running times of the queries. Itcan be seen that the running times within one test set are almost constant. Thus, the overallperformance is determined by the execution time of the HyperQueries at the remote hosts. Thisis an important insight when information is provided from multiple hosts, as it does not costmuch to query redundantly all hosts and obtain more complete information.

    Adapting the Number of Requested Objects In the previous experiment the number of re-turned objects increased with the number of requested suppliers. In this experiment we adaptedthe number of requested objects to obtain 4500 objects in total. Thus, we measured the paral-lelization of the request on 1 to 26 suppliers. Figure 17(b) shows that the total running time

    26

  • 0

    50

    100

    150

    200

    250

    300

    350

    400

    5 10 15 20 25

    Run

    ning

    Tim

    e [s

    ec]

    Number of Suppliers

    150 Requested Objects Per Supplier1500 Requested Objects Per Supplier4500 Requested Objects Per Supplier

    0

    50

    100

    150

    200

    250

    300

    350

    5 10 15 20 25

    Run

    ning

    Tim

    e [s

    ec]

    Number of Suppliers

    4500 Requested Objects

    (a) Requesting each Supplier (b) Adapting the Number150/1500/4500 Objects of Requested Objects

    Figure 17: Scaling the Number of Suppliers

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    5 10 15 20 25 30 35 40 45 50

    Run

    ning

    Tim

    e P

    er Q

    uery

    [sec

    ]

    Number of Clients

    100 Requested Objects1000 Requested Objects2000 Requested Objects

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    2 4 6 8 10 12 14 16 18 20

    Run

    ning

    Tim

    e P

    er Q

    uery

    [sec

    ]

    Number of Clients

    100 Requested Objects1000 Requested Objects2000 Requested Objects

    (a) Varying the Number of Clients from 1..50 (b) Varying the Number of Clients from 1..20

    Figure 18: Scaling the Number of Clients

    decreases with more suppliers which shows the benefits of the parallelization which is done bythe market place host.

    7.2.2 Scaling the Number of Clients

    In this test we wanted to demonstrate the scalability of the market place when varying the num-ber of clients that state simultaneous queries at the market place. Each of the clients requested100/1000/2000 products from two suppliers, thus leading to 13 client queries that can be runwithout accessing one of our 26 suppliers multiple times. We varied the number of queryingclients from 1 up to 50. Figure 18(a) gives the running times for the queries. It can be seen thatthe running times are almost constant within the sections from 1 to 13, 14 to 26, 27 to 39, 40to 50 clients and jump up in step function manner. These steps stem from the fact that the 26suppliers are accessed multiple times by different queries. The Figure 18(b) shows a projectiononto the first 20 client queries and gives a closer view of the first step which increases with morerequested objects.

    27

  • 0

    10

    20

    30

    40

    50

    60

    70

    0 5000 10000 15000 20000 25000 30000 35000 40000

    Run

    ning

    Tim

    e [s

    ec]

    Number of Requested Objects

    Varying Number of Requests

    0

    100

    200

    300

    400

    500

    600

    700

    800

    0 50000 100000 150000 200000 250000

    Run

    ning

    Tim

    e [s

    ec]

    Number of Requested Objects

    Sequencing HyperQueriesParallel HyperQueries

    Bundled HyperQueries

    (a) Accessing one Virtual Attribute (b) Accessing Multiple Virtual Attributes

    Figure 19: Scaling the Number of Requested Objects (26 Suppliers)

    7.2.3 Scaling the Number of Requested Objects

    In the last scalability benchmark we varied the number of requested objects accessing constantlyall 26 suppliers. Figure 19(a) gives the results when retrieving 1 up to 40000 objects from allsuppliers. It can be seen that the execution time of the query increases linear with the number ofrequested objects. This is not suprising as each supplier has a proportial ratio of the requestedobjects.

    7.3 Evaluating Multiple Virtual Attributes

    In this experiment we demonstrate the benefits of bundling requests for multiple virtual at-tributes. All 26 suppliers were incorporated in this experiment. The client query accessed thetwo virtual attributesSUPPLYCOSTandAVAILQTY.

    We varied the selectivity ’[sel]’ of the query, i.e., the number of requested data objects from100 up to 260000. Figure 19(b) shows the running times for the alternative execution plans.The na¨ıve plan (Sequencing HyperQueries ) had two sequencingDispatch operatorsrequesting at firstSUPPLYCOSTand thenAVAILQTY. The first optimized variant (ParallelHyperQueries ) parallelizes the evaluation of both virtual attributes and nearly halves therunning times as multiple round trips of objects are avoided. The second optimization variant(Bundled HyperQueries ) combines the evaluation of both virtual attributes in one requestand draws additional profit as the overhead of re-merging the duplicates of one input object(from parallelizing the requests) drops out.

    8 Conclusions

    In this paper we proposed a new architecture for building scalable virtual market places. OurQueryFlow system provides a new framework for dynamic distributed query processing basedon so-called HyperQueries which are essentially query evaluation plans “sitting behind” hyper-links. Architecting an electronic market place as a data warehouse by integrating all the datafrom all participating enterprises in one centralized repository incurs severe problems. Using

    28

  • HyperQueries, application integration is achieved via distributed query evaluation plans. Nowthe electronic market place serves as an intermediary between the client (issuing a query) andthe providers executing their sub-queries referenced by hyperlinks. This pushes the data inte-gration to the data providers. We gave an overview of the whole system and discussed moredetailed the most interesting parts, e.g., the administrative tasks that are accessible as extensiblee-services. We demonstrated how hyperlinks can be embedded in the intermediary’s databaseas so-called virtual attributes. Further we illustrated the execution of HyperQueries with anexample of B2B-electronic market places and pointed out more complex scenarios that canbe divided into hierarchical and broadcast HyperQuery execution. We addressed security is-sues, that are crucial for e-commerce applications. The proposed extension to already existingauthentication methods enables the authentication even in multi-level HyperQuery execution.Next, we demonstrated some effective optimization techniques for HyperQuery processing anddescribed some important implementation details. At last we investigated the scalability of ourexecution technique and showed the benefits of a proposed optimization technique.

    References

    [ADAT +99] R. H. Arpaci-Dusseau, E. Anderson, N. Treuhaft, D. E. Culler, J. M. Hellerstein, D. Pat-terson, and K. Yelick. Cluster I/O with River: Making the Fast Case Common. InProc. of the 6thworkshop on I/O in parallel and distributed systems, pages 10–22, Atlanta, GA, USA, May 1999.

    [AH00] R. Avnur and J. M. Hellerstein. Eddies: Continuously Adaptive Query Processing. InProc. ofthe ACM SIGMOD Conf. on Management of Data, pages 261–272, Dallas, USA, June 2000.

    [Bak98] Y. Bakos. The emerging role of electronic marketplaces on the internet.Communications ofthe ACM, 41(8):35–42, August 1998.

    [BCKK00] R. Braumandl, J. Claussen, A. Kemper, and D. Kossmann. Functional join processing.TheVLDB Journal, 8(3-4):156–177, 2000. Invited Contribution to the Special Issue “Best of VLDB 98”.

    [BEK+00] D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. F. Nielsen, S. Thatte,and D. Winer. Simple Object Access Protocol (SOAP) 1.1.http://www.w3.org/TR/SOAP ,May 2000.

    [BGW+81] P. Bernstein, N. Goodman, E. Wong, C. Reeve, and J. Rothnie. Query Processing in a Sys-tem for Distributed Databases (SDD-1).ACM Trans. on Database Systems, 6(4):602–625, December1981.

    [BKK +01] R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam, and K. Stocker.ObjectGlobe: Ubiquitous query processing on the Internet.The VLDB Journal: Special Issue onE-Services, 10(3):48–71, August 2001.

    [BLFM98] T. Berners-Lee, R. Fielding, and L. Masinter. Uniform Resource Identifiers (URI): Genericsyntax.http://www.rfc-editor.org/rfc/rfc2396.txt , August 1998.

    [BM00] P. V. Biron and A. Malhotra. XML Schema part 2: Datatypes. W3C Recommendation, WWW-Consortium, April 2000.http://www.w3.org/TR/xmlschema-2 .

    [BPSMM00] T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible Markup Language(XML) 1.0 (Second Edition).http://www.w3.org/TR/REC-xml/ , October 2000.

    29

    http://www.w3.org/TR/SOAPhttp://www.rfc-editor.org/rfc/rfc2396.txthttp://www.w3.org/TR/REC-xml/

  • [BSZ98] M. Bichler, A. Segev, and J. L. Zhao. Component-based e-commerce: Assessment of currentpractices and future directions.ACM SIGMOD Record, 27(4):7–14, December 1998.

    [CDSC00] I. Cingil, A. Dogac, E. Sevinc, and A. Cosar. Dynamic modification of XML documents:External application invocation from XML.ACM SIGecom exchanges, 1(1):1–6, August 2000.

    [CDSS98] S. Cluet, C. Delobel, J. Sim´eon, and K. Smaga. Your mediators need data conversion! InProc. of the ACM SIGMOD Conf. on Management of Data, pages 177–188, Seattle, WA, USA, June1998.

    [CDTW00] J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A Scalable Continuous QuerySystem for Internet Databases. InProc. of the ACM SIGMOD Conf. on Management of Data, pages379–390, Dallas, USA, June 2000.

    [CFR+01] D. Chamberlin, D. Florescu, J. Robie, J. Sim´eon, and M. Stefanescu. XQuery: A QueryLanguage for XML.http://www.w3.org/TR/xquery/ , June 2001.

    [CKKW00] J. Claussen, A. Kemper, D. Kossmann, and C. Wiesner. Exploiting early sorting and earlypartitioning for decision support query processing.The VLDB Journal, 2000.

    [Cov] Covisint. http://www.covisint.com/ .

    [DA99] T. Dierks and C. Allen. The TLS Protocol Version 1.0.ftp://ftp.isi.edu/in-notes/rfc2246.txt , January 1999.

    [DHW01] D. Draper, A. Y. Halevy, and D. S. Weld. The Nimble XML Data Integration System. InProc. IEEE Conf. on Data Engineering, pages 155–160, Heidelberg, Germany, 2001.

    [FGM+99] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee.Hypertext Transfer Protocol – HTTP/1.1.ftp://ftp.isi.edu/in-notes/rfc2616.txt ,June 1999.

    [FKK96] A. Frier, P. Karlton, and P. Kocher.The SSL 3.0 Protocol. Netscape Communications Corp.,http://home.netscape.com/eng/ssl3 , November 1996.

    [GHJV95] E. Gamma, R. Helm, R. Johnson, and J. Vlissides.Design Patterns – Elements of ReusableObject-Oriented Software. Addison-Wesley, Reading, MA, USA, 1995.

    [Gra93] G. Graefe. Query Evaluation Techniques for Large Databases.ACM Computing Surveys,25(2):73–170, June 1993.

    [GW00] R. Goldman and J. Widom. WSQ/DSQ: A Practical Approach for Combined Querying ofDatabases and the Web. InProc. of the ACM SIGMOD Conf. on Management of Data, pages 285–296, Dallas, USA, June 2000.

    [HFPS99] R. Housley, W. Ford, W. Polk, and D. Solo. Internet X.509 Public Key Infrastructure Certifi-cate and CRL Profile.http://www.rfc-editor.org/rfc/rfc2459.txt , January 1999.

    [HKWY97] L. Haas, D. Kossmann, E. Wimmers, and J. Yang. Optimizing Queries Across Diverse DataSources. InProc. of the Conf. on Very Large Data Bases (VLDB), pages 276–285, Athens, Greece,August 1997.

    30

    http://www.w3.org/TR/xquery/http://www.covisint.com/ftp://ftp.isi.edu/in-notes/rfc2246.txtftp://ftp.isi.edu/in-notes/rfc2616.txthttp://www.rfc-editor.org/rfc/rfc2459.txt

  • [HN96] J. Hellerstein and J. Naughton. Query Execution Strategies for Caching Expensive Methods. InProc. of the ACM SIGMOD Conf. on Management of Data, pages 423–434, Montreal, Canada, June1996.

    [HSC99] J. M. Hellerstein, M. Stonebraker, and R. Caccia. Independent, Open Enterprise Data Integra-tion. IEEE Data Engeneering Bulletin, 22(1):43–49, March 1999.

    [IFF+99] Z. Ives, D. Florescu, M. Friedman, A. Levy, and D. Weld. An Adaptive Query ExecutionEngine for Data Integration. InProc. of the ACM SIGMOD Conf. on Management of Data, pages299–310, Philadelphia, PA, USA, June 1999.

    [ILW00] Z. G. Ives, A. Y. Levy, and D. S. Weld. Efficient evaluation of regular path expressions onstreaming xml data. Technical report uw-cse-2000-05-02, University of Washington, 2000.

    [Jhi00] A. Jhingran. Moving up the food chain: Supporting E-Commerce Applications on Databases.ACM SIGMOD Record, 29(4):50–54, December 2000.

    [KGM91] T. Keller, G. Graefe, and D. Maier. Efficient Assembly of Complex Objects. InProc. of theACM SIGMOD Conf. on Management of Data, pages 148–158, Denver, CO, USA, May 1991.

    [KW01] A. Kemper and C. Wiesner. HyperQueries: Dynamic Distributed Query Processing on theInternet. InProc. of the Conf. on Very Large Data Bases (VLDB), pages 551–560, Rome, Italy,September 2001.

    [LSK95] A. Y. Levy, D. Srivastava, and T. Kirk. Data Model and Query Evaluation in Global Informa-tion