14
ORIGINAL ARTICLE Maya Rodrig Anthony LaMarca Oasis: an architecture for simplified data management and disconnected operation Received: 31 May 2004 / Accepted: 8 June 2004 / Published online: 10 February 2005 Ó Springer-Verlag London Limited 2005 Abstract Oasis is an asymmetric peer-to-peer data management system tailored to the requirements of pervasive computing. Drawing upon applications from the literature, we motivate three high-level requirements: availability, manageability, and programmability. Oasis addresses these requirements by employing a peer-to- peer network of weighted replicas and performing background self-tuning. In this paper, we describe our architecture, our consistency-control mechanism, and an initial implementation. Our performance evaluation and the implementation of three applications suggest that Oasis offers good availability and performance while providing a simple API and a familiar consistency model. 1 Introduction The vision of pervasive computing is an environment in which users, computation, and the physical environment are artfully blended to provide in situ interactions that increase productivity and quality of life. While many of the hardware components required to realize this vision are available today, there is a dearth of robust applica- tions to run on these new platforms. We contend that there are so few pervasive computing applications be- cause they are too hard to develop, deploy, and manage. A number of factors that are particular to pervasive computing scenarios make application development challenging: devices are resource-challenged and faulty, and devices may be continually arriving and departing. While prototypes of compelling applications can be de- ployed in the lab, it is very difficult to build an imple- mentation that is robust and responsive in a realistic pervasive environment. We argue that the best way to foster pervasive com- puting development is to provide developers with a comprehensive set of software services, in effect, an ‘‘operating system (OS) for pervasive computing.’’ While there has been work in the area of system software for pervasive computing, a number of significant challenges remain [17]. In this paper, we address the challenge of providing pervasive computing support for one of the more traditional services, namely, the storage and management of persistent data. We examined 15 per- vasive computing applications described in the literature and we have distilled a common set of requirements that fall in the areas of availability, manageability, and pro- grammability. Based upon these requirements, we have designed and implemented a data management system called Oasis. The tension between the requirement for high avail- ability and the need to provide strong consistency guarantees and support disconnected operation became apparent during the design of the system. To address this issue, we adapted a classic consistency mechanism developed by David Gifford called weighted voting [10] to operate within a peer-to-peer architecture. We developed a fully decentralized variant of Gifford’s scheme and applied it as the consistency-control mech- anism for Oasis. We have tested the performance of Oasis and used it to implement three pervasive com- puting applications in order to understand how well it satisfies the application-derived requirements. The rest of the paper is organized as follows. In Sect. 2, we identify the data management requirements of pervasive computing applications and draw specific examples from the literature. Section 3 presents the main components of the Oasis architecture. In Sect. 4, we describe the decentralized weighted voting scheme that we have developed, and in Sect. 5, we discuss M. Rodrig (&) Department of Computer Science and Engineering, University of Washington, Washington, USA E-mail: [email protected] A. LaMarca Intel Research Seattle, Seattle, Washington, USA E-mail: [email protected] Pers Ubiquit Comput (2005) 9: 108–121 DOI 10.1007/s00779-004-0315-6

Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

ORIGINAL ARTICLE

Maya Rodrig Æ Anthony LaMarca

Oasis: an architecture for simplified data managementand disconnected operation

Received: 31 May 2004 / Accepted: 8 June 2004 / Published online: 10 February 2005� Springer-Verlag London Limited 2005

Abstract Oasis is an asymmetric peer-to-peer datamanagement system tailored to the requirements ofpervasive computing. Drawing upon applications fromthe literature, we motivate three high-level requirements:availability, manageability, and programmability. Oasisaddresses these requirements by employing a peer-to-peer network of weighted replicas and performingbackground self-tuning. In this paper, we describe ourarchitecture, our consistency-control mechanism, and aninitial implementation. Our performance evaluation andthe implementation of three applications suggest thatOasis offers good availability and performance whileproviding a simple API and a familiar consistencymodel.

1 Introduction

The vision of pervasive computing is an environment inwhich users, computation, and the physical environmentare artfully blended to provide in situ interactions thatincrease productivity and quality of life. While many ofthe hardware components required to realize this visionare available today, there is a dearth of robust applica-tions to run on these new platforms. We contend thatthere are so few pervasive computing applications be-cause they are too hard to develop, deploy, and manage.A number of factors that are particular to pervasivecomputing scenarios make application developmentchallenging: devices are resource-challenged and faulty,

and devices may be continually arriving and departing.While prototypes of compelling applications can be de-ployed in the lab, it is very difficult to build an imple-mentation that is robust and responsive in a realisticpervasive environment.

We argue that the best way to foster pervasive com-puting development is to provide developers with acomprehensive set of software services, in effect, an‘‘operating system (OS) for pervasive computing.’’ Whilethere has been work in the area of system software forpervasive computing, a number of significant challengesremain [17]. In this paper, we address the challenge ofproviding pervasive computing support for one of themore traditional services, namely, the storage andmanagement of persistent data. We examined 15 per-vasive computing applications described in the literatureand we have distilled a common set of requirements thatfall in the areas of availability, manageability, and pro-grammability. Based upon these requirements, we havedesigned and implemented a data management systemcalled Oasis.

The tension between the requirement for high avail-ability and the need to provide strong consistencyguarantees and support disconnected operation becameapparent during the design of the system. To addressthis issue, we adapted a classic consistency mechanismdeveloped by David Gifford called weighted voting [10]to operate within a peer-to-peer architecture. Wedeveloped a fully decentralized variant of Gifford’sscheme and applied it as the consistency-control mech-anism for Oasis. We have tested the performance ofOasis and used it to implement three pervasive com-puting applications in order to understand how well itsatisfies the application-derived requirements.

The rest of the paper is organized as follows. In Sect.2, we identify the data management requirements ofpervasive computing applications and draw specificexamples from the literature. Section 3 presents the maincomponents of the Oasis architecture. In Sect. 4, wedescribe the decentralized weighted voting scheme thatwe have developed, and in Sect. 5, we discuss

M. Rodrig (&)Department of Computer Science and Engineering,University of Washington, Washington, USAE-mail: [email protected]

A. LaMarcaIntel Research Seattle, Seattle, Washington, USAE-mail: [email protected]

Pers Ubiquit Comput (2005) 9: 108–121DOI 10.1007/s00779-004-0315-6

Page 2: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

implementation decisions related to this consistency-control mechanism. Section 6 describes our experiencesfrom constructing three applications on top of Oasis. Wediscuss the performance of the system as measured withthe workload of one of our applications in Sect. 7. InSects. 8 and 9, we compare Oasis to related work andpresent our conclusions.

2 Data management requirements of pervasivecomputing

Through a survey of 15 pervasive computing applica-tions published in the literature [1, 3, 5, 6, 7, 13, 14, 16,20– 22, 26, 28, 31, 32], we have identified what we believeare the important data management requirements ofthese applications. The breadth of the applicationscovered in the survey includes smart home applications,applications for enhancing productivity in the work-place, and personal area network (PAN) applications.The specific requirements can be grouped into threeareas: availability, programmability, and manageability.

2.1 Availability

Pervasive computing applications are being developedfor environments in which people expect devices tofunction 24 hours a day, 7 days a week. The Aware-Home [16] and EasyLiving [6] projects, for example,augment household appliances such as refrigerators,microwaves, and televisions that typically operate withextremely high reliability. For many of these augmenteddevices to function, uninterrupted access to a datarepository is needed; thus, a storage solution for per-vasive computing must ensure that data is available inthe following conditions:

Data access must be uninterrupted, even in the face ofdevice disconnections and failures Proposed pervasivecomputing scenarios utilize existing devices in the homeas part of the computing infrastructure [6, 20]. A datamanagement solution should be robust to the failure ofsome number of these devices; turning off a PC in asmart home should not cause the entire suite of perva-sive computing applications to cease functioning. Thedata management system must handle both gracefuldisconnections and unexpected failures, and the datamust remain available as devices come and go.

Data may need to be accessed simultaneously in multiplelocations, even in the presence of network partitions Themajority of applications we examined include scenariosthat require support for multiple devices accessing thesame data in multiple locations. Commonly, theseapplications call for users to carry small, mobile devicesthat replicate a portion of the user’s home or work data[22, 31]. Labscape [3] cites disconnection as uncommon,but would like to support biologists who choose to carry

a laptop out of the lab. Finally, some applications in-volve non-mobile devices sharing data over unreliablechannels. The picture frame in the Digital Family Por-trait [26], for example, communicates with sensors in thehome of a geographically remote family member. In allof these cases, the application scenarios assume theexistence of a coherent data management system thatsupports disconnected operation.

Data can be accessed from and stored on impoverisheddevices Pervasive computing applications commonlyinvolve inexpensive, resource-constrained devices usedfor both accessing and storing data. In PAN applica-tions, for example, impoverished mobile devices fre-quently play a central role in caching data and movingdata between I/O and other computational devices [21,22, 31]. Ideally, a data management system for pervasivecomputing would accommodate the limitations of re-source-challenged devices; challenged devices would beable to act as clients, while data could be stored on fairlymodest devices.

2.2 Manageability

Perhaps the single largest factor preventing pervasivecomputing from becoming a mainstream reality is thecomplexity of managing the system. We have identified anumber of features that are essential to making a datamanagement system for pervasive computing practicalfor deployment with real users:

Technical expertize should be required only in extremecases By many accounts, the ‘‘living room of the fu-ture’’ will have the computational complexity of today’sserver room; however, there will rarely be an expert tomanage it. In many cases, only non-technical users arepresent [26], while in extreme applications, like Plant-Care [20], there are no users at all. In the spirit of IBM’sautonomic computing initiative [35], data managementfor pervasive computing environments should be self-managing to the largest extent possible.

Adjustments to storage capacity should be easy andincremental Many of the pervasive computing systemswe examined could most appropriately be labeled asplatforms on which many small applications andbehaviors are installed [5, 6, 13]. In such an environ-ment, the data management system should be able togrow incrementally to support changing workloads andcapacity needs.

The system should adapt to changes within and acrossapplications The wide variety of devices and appli-cations suggests that the data management systemshould monitor and adapt to changes in configurationand usage. Consider the location tracking system thatis common to many pervasive computing scenarios

109

Page 3: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

[3, 6, 32]; its job is to track people and objects andproduce a mapping for other applications to use. Insome scenarios, this location data is used infrequentlywhile other scenarios may require hundreds of queriesagainst this data each second. A static configurationruns the risk of either providing poor performance orover-allocating resources. An adaptive solution, on theother hand, could detect the activation of a demand-ing application and adjust priorities accordingly,ensuring good performance and overall system effi-ciency.

2.3 Programmability

A distributed, dynamic, and fault-ridden pervasivecomputing environment is a far cry from the computingplatforms on which most software engineers are trained.With this in mind, we have identified a number ofrequirements intended to lower the barrier to entry andmake reliable, responsive pervasive computing applica-tions easier to develop:

The system should offer rich query facilities Pervasivecomputing applications often involve large amounts ofstructured sensor data and frequent searches throughthis data. A common pattern is that an application takesan action if a threshold value is crossed (e.g., goingoutdoors triggers a display change [26], low humiditytriggers plant watering [20], proximity to a table triggersthe migration of a user interface [3]). Data managementsystems that provide indexing and query facilities vastlyreduce the overhead in creating efficient implementa-tions of such behaviors.

The system should offer a familiar consistency modelSome distributed storage systems provide ‘‘update-anywhere semantics’’ [30] in which clients can read andwrite any replica data at any time, even when discon-nected. These systems provide weak consistency guar-antees where applications may see writes to the dataoccurring in a different order than what they werewritten in. These weak guarantees can cause a widerange of unpredictable behaviors that make it difficultto write reliable applications. We feel that a familiar,conservative consistency model is more appropriate formost pervasive computing applications, even if it resultsin a loss of flexibility.

The system should provide a single global view of thedata In some cases, developers need to place replicason specific devices in order to achieve particularsemantics. Many applications, however, merely want toreliably store and retrieve data. Accordingly, a datamanagement system for pervasive computing shouldinclude a facility for automatically placing data andpresent the view of a single global storage space toapplication developers. These decisions can be guided by

hints given by the application, but the developer shouldnot be directly exposed to a disparate collection ofstorage devices.

3 The Oasis architecture

Oasis is a data management system tailored to therequirements of pervasive computing. In Oasis, users(applications) access data via a client service that in turncommunicates with a collection of Oasis servers. Theclient service stores no persistent data; its only purpose isto run the Oasis consistency protocol. (The client’sfunction has been separated to allow the participation ofimpoverished devices like sensor beacons.) Figure 1shows an example of an Oasis configuration in an in-strumented home. Data is replicated across the Oasisservers to provide high availability in the event of adevice disconnection or failure. Oasis does not dependon any single server; data remains available to clientsprovided that a single replica is available. In theremainder of this section, we describe the Oasis archi-tecture and describe how these enable Oasis to meet therequirements described in Sect. 2.

3.1 Data model

From the client’s perspective, Oasis is a database thatsupports the execution of SQL (structured querylanguage) queries on relational data. We chose SQLbecause it is a widespread standard for accessing struc-tured data. An Oasis installation stores a number ofdatabases. Each database holds a number of tables thatin turn hold a set of records. We envision that differentdatabases would be created for different types of data,such as sensor readings, configuration data, soundsamples, etc. In Sect. 6, we describe three applications

Home server

temperatures

locations

medications

rfid readingsconfiguration

Media set-top box

temperatures

rfid readings

configuration

PDA

Sensor nodesnotes

Kitchen helper

Laptop

medications

locationsrfid readings

read kitchentemp

write garagetemp

upda

te

quer

y update

xxxOasis server

Clients

Databases

Applications

Fig. 1 An example of an Oasis configuration

110

Page 4: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

that we have built using Oasis along with their datarepresentations. It should be noted that nothing in therest of the architecture is specific to the relational datamodel, and Oasis could manage file- or tuple-orienteddata with a few small changes.

3.2 P2P architecture with replication

As devices may arrive and depart in pervasive comput-ing scenarios, an architecture that supports dynamicmembership is needed. A pure peer-to-peer (P2P)architecture provides the desired decentralization,adaptability, and fault-tolerance by assigning all peersequal roles and responsibilities. However, the emphasison equal resource contribution by all peers ignores dif-ferences in device capabilities. To support a wide varietyof devices, Oasis is an asymmetric P2P system, or ‘‘su-per-peer’’ system [34], in which devices’ responsibilitiesare based on their capabilities. Devices with greatercapabilities contribute more resources and can performcomputation on behalf of others, while impoverisheddevices may have no resources to contribute.

Data is replicated across multiple Oasis servers toprovide high availability. In our initial implementation,replication is done at the database level. (Replicatingentire databases simplifies implementation, but poten-tially overburdens small devices. In Sect. 9, we discussthe potential for partial replication.) An initial replicaplacement is determined at creation time and is thentuned as devices come and go, and data usage changes.The self-tuning process is described in Sect. 3.4

3.3 Weighted voting and disconnected operation

All distributed data stores employ an access-coordi-nation algorithm that offers a consistency guarantee toclients accessing the data. To provide developers witha familiar consistency model, we chose an algorithmfor Oasis that offers clients sequential consistency [23]when local replicas are available. Sequential consis-tency guarantees that the operations of all clientsexecute in some sequential order, and that the oper-ations of each client appear in this total ordering inthe same order specified by its program. Basically,sequential consistency provides a set of distributedclients with the illusion that they are all running on asingle device.

The traditional way to provide sequential consistencyand allow disconnected operation is with a quorum-based scheme in which a majority of the replicas must bepresent to update the data. We have adapted Gifford’s‘‘weighted voting’’ [10] variant of the basic quorumscheme. As in a quorum-based scheme, data replicas areversioned to allow clients to determine which replica isthe most recent. In addition, weighted voting assignsevery replica of a data object a number of votes. Thetotal number of votes assigned to all replicas of the

object is N. A write request must lock a set of replicaswhose votes sum to at least W, while read operationsmust contact a set of replicas whose votes sum to at leastR votes. On a read, the client fetches the value from thereplica with the highest version number. On a write, theclient must update all of the replicas it has locked.Weighted voting ensures sequential consistency byrequiring that R+W>N. This constraint guaranteesthat no read can complete without seeing at least onereplica updated by the last write (since R>N�W).Weighted voting is more flexible than a quorum-basedapproach because the vote allocation as well as R and Wcan be tailored to the expected workload. Making Rsmall, for example, boosts performance by allowingclients to access different replicas in parallel. Making Rand W close to N/2 allows up to half the servers to fail,thereby increasing fault-tolerance.

One drawback of Gifford’s weighted voting schemeis the restriction that data must be accessed from asingle, centralized client. The client, known as thecollector, is used to gather a quorum for every requestto the data object. Having a single client simplifies thealgorithm by providing a centralized location for theinformation about the data object, or metadata (list ofreplica locations, versions, and vote allocation, R, W,and N), to be maintained. However, maintainingmetadata in a centralized fashion is not suitable in adistributed system that experiences frequent devicedisconnections. As devices come and go, changing thenumber and placement of the data object’s replicasmay be required. In Sect. 4, we describe the decen-tralized variant of Gifford’s scheme that we developedin order to support multiple clients, replica reorgani-zation, and metadata changes.

3.4 Online self-tuning and adaptability

Oasis was designed with self-tuning in mind. The SQLdata model provides the opportunity to add and deleteindices. Our weighted voting scheme permits flexibilityas to how many and where the replicas are created, andwhat R and W should be. Finally, our consistencyscheme allows these parameters to be adjusted during astream of client requests. This allows Oasis to be tunedin an online fashion without denying applications accessto the data or requiring user intervention.

Applications have the choice of managing theirown replica placement and vote assignment, orallowing Oasis to manage the data on their behalf.For applications that do not want to manage theirown replica placement, Oasis includes a self-tuningservice that automatically handles replica configura-tion. When databases are created in Oasis, perfor-mance and availability expectations can be providedby applications that want auto-tuning. Oasis serverspublish their performance and availability character-istics and the self-tuner uses these along with theapplication expectations to make its configuration

111

Page 5: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

decisions. The self-tuner periodically examines eachdatabase’s expectations and checks whether they arebest served by their current replica placement and voteassignment, making adjustments if appropriate. As wediscuss in Sect. 9, we see the development of moresophisticated self-tuning behaviors based on machinelearning techniques as a promising direction to pursuefor future research.

4 Decentralized weighted voting

To manage replicated data in a dynamic distributedsystem with multiple clients and servers and frequentdisconnections, we developed a variant of Gifford’sweighted voting scheme. We decentralized the func-tionality of the collector (the centralized client) bydistributing versioned copies of the metadata alongwith the data in each replica and providing all clientswith the ability to access the metadata. By versioningthe metadata and imposing the same quorumrequirements to update metadata as those to updatedata, we ensure that quorums are based on the mostcurrent metadata, and, thus, every request accesses themost current data. This allows data and metadataoperations to be safely interleaved, enabling the sys-tem to perform self-tuning. To guarantee sequentialconsistency, we enforce the additional constraint thatW=R (see Sect. 4.2). With the exception of thisadditional constraint, our decentralized weighted-vot-ing scheme provides the same flexibility in replicaplacement and vote assignment as the original.

4.1 The protocols

In this section, we describe the protocols that are centralto our decentralized weighted-voting scheme. The pro-tocols involve two entities: servers and clients. Serversare storage repositories for replicas of data objects; cli-ents issue read and write requests to the data on behalfof users (applications), thus, controlling access to thedata and ensuring sequential consistency. Messages areused for communication between clients and servers aswell as among servers.

At a high level, our decentralized weighted votingscheme involves three steps. First, a client employs abootstrapping mechanism to obtain a copy of themetadata for the data object to be read or written.Second, the client uses information found in theacquired metadata to establish a quorum. For writerequests, establishing a quorum requires locking a setof replicas with W votes. For read requests, a quorumis obtained by fetching the version of the data objectfrom a set of replicas with R votes and then lockingone of the replicas with the most current version ofthe data. Finally, once the appropriate quorum isestablished, the client requests the servers storing thelocked replicas to execute one of three operations:

write data, read data, or update metadata. Once theoperation is executed, the replicas involved are un-locked and a reply is sent back to the client. If a writeoperation fails to complete, a failure handing mecha-nism is used to ensure the integrity of the data andconsistency across replicas.

We begin our protocol description with the boot-strapping and locking mechanisms. We then describe thethree operations—write, read, and metadata upda-te—that can be executed once an appropriate quorum isobtained. We end this section with a brief description offailure handing.

4.1.1 Bootstrap

To satisfy a user’s read or write request, a client mustfind metadata for the data object of interest. The clientfirst performs a lookup in its local metadata cache forthe desired metadata. If the metadata is found locally,the client can proceed to contact the servers listed in themetadata to acquire a quorum. However, if the meta-data is not locally available, the client must determinewhich servers (potentially storing the desired metadata)are currently available. We assume the existence of amechanism to locate the locally available servers. Thesearch for servers can be based on a discovery service orone of many distributed search schemes, such as dis-tributed hash tables (DHTs) [29]. Once the servers arelocated, the client contacts them in a random order tofind the metadata. A more sophisticated search mecha-nism can replace this random scan when a large numberof servers are available (although this was not the case inthe pervasive applications we examined). If metadata forthe object of interest is not found on any of the locallyavailable servers, the request cannot be processed at thattime. If the metadata is found, the bootstrapping processcontinues.

Based on the acquired metadata, the client selectsthe servers to contact in order to establish a read orwrite quorum as needed. The initially acquired meta-data may not be up to date. However, since metadatais only updated with a write quorum, at least one ofthe contacted servers is guaranteed to store an up-to-date version of the metadata. If a newer version of themetadata exists, it is sent back to the client by at leastone of the servers. The client may need to contactadditional servers to acquire a quorum based on theinformation in the new metadata. A quorumcomposed of replicas on servers listed in the currentversion of the metadata is guaranteed to include anup-to-date version of the data.

4.1.2 Locking

A client must establish a quorum of votes to access adata object and process the user’s request. To acquirethe votes, the client contacts the servers storing replicasof the data object (as listed in the metadata) to obtain a

112

Page 6: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

lock on their replicas. The client sends a lock request to aserver for a specific data object. The server replies with a‘‘yes’’ or ‘‘no,’’ indicating whether the lock was acquiredor not, the version of the data object it is storing, and themetadata of the object. If the metadata sent back fromthe server is newer than the metadata available to theclient, the new metadata replaces the older version onthe client. New metadata may require the client to sendadditional lock requests if replica locations, vote distri-bution, R, or W have changed. This is shown in Fig. 2when the client receives a more up-to-date version of themetadata from server 4.

4.1.3 Write operations

To process a write request, a client must first obtain thelock on a set of replicas whose votes sum to at least W.Once a quorum of W votes is acquired, at least one up-to-date replica is guaranteed to be locked (see Sect. 4.2).The client selects an up-to-date replica from the quorumand sends a request to the server telling it to update allreplicas in the quorum that do not have the most currentversion of the data. The server contacts the serversholding stale replicas with the list of updates necessaryto bring their data replicas up to date. Once the updating

server receives a reply from the servers that requiredupdating, it sends a reply back to the client indicatingwhether all replicas in the quorum were brought up todate. If the reply is positive, the client sends the writerequest to all the servers holding replicas in the quorum.The servers execute the write request, increment theversion of the data, send the result back to the client,and unlock their replicas. Figure 3 shows an example ofan update to a collection of servers. If a client cannotacquire a quorum of W votes, the write request cannotproceed.

4.1.4 Read operations

To process a read request, a client must first fetch theversions of the data object from a set of replicas whosevotes sum to at least R. Once the client obtains replies toversion requests from a quorum, our algorithm guar-antees that at least one of the replies includes the mostcurrent version of the data (see Sect. 4.2). The clientsends the read request to one of the servers that re-sponded with the most current version. The server locksits replica, executes the read request, unlocks the replica,and sends the results back to the client. Figure 4 showsan example of a read request being processed.

4. db1 Locked, V1**, MV1

7. db1 Locked, V2**, MV2

5. db1 Locked, V2**, MV2

6. Lock(db1, V2*)

3. Lock(db1, V1*)

1. AcquireMetadata(db1)

2. MV14

client

DV2 MV1

DV2 MV2

DV2 MV2

DV1 MV1

DV7 MV3

server 1

server 5

server 2

server 3

server 4

6

3

3

1

2

5

7

MV2: N=6, R=3, W=4

server votes1 12 23 14 2

MV1: N=6, R=3, W=4

server votes1 12 13 14 3

Data MD Replica of db1 (data[version], metadata[version])

Replica of db2 (data[version], metadata[version])

V* - Metadata version

V** - Data versionData MD

Fig. 2 An example ofbootstrapping and locking toacquire a write quorum (W=4)

113

Page 7: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

If a client cannot obtain versions from a quorum ofreplicas with at least R votes, sequential consistencycannot be guaranteed and, thus, the read requestcannot be processed. These periods of read-unavail-

ability can be eliminated by having a device maintainan up-to-date zero-vote replica. Replicas with zerovotes cannot help in the establishment of a quorum,and, thus, can be added to any device withoutaffecting any other device’s access to the data. Theselocal replicas provide a local copy of the data that canalways be accessed, even if a quorum cannot be

client

DV2 MV1

DV2 MV2

DV2 MV2

DV1 MV1

DV7 MV3

server 1

server 5

server 2

server 3

server 4

1

2

3

4

5

6

7

8

1. AcquireMetadata(db1)

2. MV1

3. VersionRequest(db1,V1*)

4. V2**, MV2

5. VersionRequest(db1,V2*)

6. V1**, MV1

7. Read(db1)

8. Read result

MV1: N=6, R=3, W=4

server votes1 12 13 14 3

MV2: N=6, R=3, W=4

server votes1 12 23 14 2

Data MD Replica of db1 (data[version], metadata[version])

Replica of db2 (data[version], metadata[version])

V* - Metadata version

V** - Data versionData MD

Fig. 4 An example ofbootstrapping to acquire a readquorum (R=3) and thecorresponding read operation

DV2 MV1

DV2 MV2

DV2 MV2

DV1 MV1

DV7 MV3

server 1

server 5

server 2

server 3

server 4

2. Update(db1)

3. Update success

4. Update complete

5. Write(db1)

6. Write result

client 1

5

5

6

6

45

DV2 MV1

DV7 MV3

server 1

server 5

server 2

server 3

server 4

DV3 MV2

DV3 MV2

DV3 MV2

2

3

MV2: N=6, R=3, W=4

server votes1 12 23 14 2

Data MD

Data MD Replica of db1 (data[version], metadata[version])

Locked replica of db1

Fig. 3 An example of a write operation following the quorumacquisition. During the operation the, servers move from the stateshown in the left column to the state shown on the right

114

Page 8: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

acquired. Keeping a local replica up to date requires achange to the read protocol. The server processing theread request piggybacks with the reply a set of chan-ges that can be applied to the client’s local copy inorder to update it to match the version on the server.The client is required to apply these changes to thezero-vote replica before returning the read result tothe user. In this way, even if the client device dis-connects, future reads can be serviced from the localcopy. While this local copy of the data is potentiallystale, sequential consistency is ensured.

4.1.5 Metadata updates

Updates to the metadata of a data object are eitherrequested by an application or triggered by a self-tuningservice that we employ to improve replica placement andvote distribution in the system. Metadata writes areprocessed much like writes to the data object itself. Thesame W votes that are required for a quorum to write tothe data govern updates to the metadata. A write tometadata may change R, W, N, the list of servers storingreplicas, or the distribution of votes across replicas.Updates to a data object’s metadata must be revealed aschanges to its replica configuration in the system. Thus,once a client establishes a write quorum based onreplicas in the existing metadata and sends them meta-data update requests, some reorganization needs to takeplace to reflect the migration to the new metadata.Replicas of the data object and the new metadata aresent to new servers on the metadata list, while replicason servers that are no longer on the metadata list aredeleted. Once the changes have been made, the serversparticipating in the quorum unlock their replicas and

send the client a reply (success/failure). Figure 5 showsan example of a metadata update.

4.1.6 Handling failures

The failure of an update request to complete can renderreplicas inconsistent. To ensure the consistency of a dataobject, clients use a two-phase commit protocol whenacquiring votes and executing updates on a replica. Twosituations may cause an update request to fail: the clientissuing the request fails after locking all or part of aquorum, or a server storing a replica fails while thereplica is locked as part of a write quorum. In the formercase, the servers receive a lock request from the client,but never receive the expected request to update theirreplicas. After a predetermined period of time, theservers assume that the client has failed while processingthe request and invalidates their replicas of the dataobject. In the latter case, the server invalidates thereplica once it recovers from the failure. By makinglocks persistent, the server knows which replicas werelocked at the time of failure and, thus, which requirerecovery. Invalid replicas cannot participate in clientoperations until a distributed recovery algorithm [11]has been successfully executed.

4.2 Guarantees

Adapting Gifford’s weighted voting algorithm to a P2Penvironment introduces two challenges beyond ensuringsequentially consistent access to data objects. First,maintaining metadata in a distributed fashion andensuring the use of up-to-date metadata to establish

Fig. 5 An example of ametadata update following thequorum acquisition. During theoperation, the servers movefrom the state shown in the leftcolumn to the state shown onthe right

115

Page 9: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

quorums. Second, propagating version numbers to alloweach quorum to determine what the most current versionof the data is. In this section, we describe therequirements we impose in order to guarantee the use ofcorrect metadata and version propagation.

4.2.1 Correct metadata

Our decentralized weighted voting scheme relies on therequirement introduced in Gifford’s scheme thatR+W>N to ensure that every read request accesses themost current version of the data. Since R>N–W, everyread quorum is guaranteed to include at least one replicathat was part of the last write quorum, and, thus, haveaccess to up-to-date data. We can provide the sameguarantee for accessing metadata since the read andwrite quorums for accessing metadata must also con-form to R+W>N. Guaranteeing access to the mostcurrent version of the metadata is necessary in order toensure that quorums are gathered based on metadatarepresenting the current replica configuration in thesystem. As an alternative, the metadata could be man-aged as a separate data object with its own votes andquorum values, allowing it to be replicated and config-ured independently of the data itself. This flexibilitywould come at the cost of two locking phases, first themetadata and then the data, while our algorithm com-bines them into one.

During the bootstrapping process, a client may ini-tially acquire metadata that is not up to date and beginsending lock requests to servers listed in that metadata.In rare instances, a client may obtain very stale metadatafor which none of the listed servers still hold replicas. Inthis case, the client must discard the old metadata andre-bootstrap to find a newer version. More commonly,stale metadata leads a client to servers with more andmore up-to-date metadata. Having servers piggybacktheir metadata along with their lock replies guaranteesthat the client will acquire the current metadata before aquorum is obtained.

4.2.2 Version propagation

In Gifford’s algorithm, the collector has global knowl-edge of the version of the data. The replicas that areupdated under a write quorum are informed by thecollector of their new data version. Even in the casewhere W<R, in which a write quorum may not includeany replica from a write quorum preceding it (sinceW<N–W), the correct version will be propagated to thereplicas in the second write quorum by the collector.

Distributing the metadata and doing away with thecentralized client model requires us to enforce oneadditional constraint on the choice of R and W in ourscheme. By adding the requirement W=R, we ensurethat W=R>N–W. As a result, we guarantee that everywrite quorum includes at least one replica from the writequorum preceding it, and, thus, the correct version of

the data or metadata can propagate from one quorum tothe next.

5 Implementation details

Our initial implementation of Oasis was written in Javaand our servers and mediators communicate using XMLover HTTP. Oasis was implemented as a meta-databasethat delegates storage and indexing to an underlyingdatabase. The Oasis server has been written to run ontop of any JDBC-compliant database that supportstransactions. Our initial deployments have used a varietyof products: PostgreSQL and MySQL have been usedon PC-class devices, and PointBase, an embedded,small-footprint database, has been used with iPAQs andother ARM-based devices.

In this section, we discuss several implementationdecisions we made that pertain to our weighted votingalgorithm.

5.1 Data object creation

In Oasis, clients handle requests made by applications tocreate new databases. Before a database can be created,its replica locations and vote distribution must bedetermined. Many real-world applications require aspecific replica layout. (e.g., all the votes go to a primaryserver, other servers get zero-vote replicas to serve ascaches; or four servers get one-quarter of the votes each,with W=N and R=1 in order to obtain high read per-formance). Oasis supports explicit configuration byallowing applications to provide the client with a specificreplica distribution. Oasis also supports a secondmethod in which applications specify expected perfor-mance and expected reliability, and Oasis makes thereplica configuration decisions itself. Oasis servers de-scribe their own performance and reliability and theclients use a simple additive greedy algorithm to matchapplications’ specifications to the characteristics ofavailable servers in order to decide on the new data-base’s replica configuration. For database creation, aclient acquires a quorum consisting of all the servers toreceive replicas of the new database and sends themrequests to create the new database.

To ensure the uniqueness of database names acrossan Oasis installation, we added a distinguished databasethat holds the names of all the databases in the system. Ifa new database is to be created, it must first be addeduniquely to this distinguished database. While this sim-ple approach implies that databases cannot be created inmultiple locations at the same time, it would not presenta problem for the applications we examined in ourstudy; in general, databases were created at installationtime only. In order to support the creation of temporary‘‘scratch’’ storage, Oasis can create databases with ran-dom unique names without requiring that the distin-guished database be available for writing.

116

Page 10: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

5.2 Locking for read operations

The protocol for the read operation described in Sect.4.1.4 requires a client to lock a single replica with thecurrent version of the data once it has obtained replies toversion requests from a set of replicas whose votes sumto R. This lock request can be eliminated completely ifthe underlying data object (database, file, record, etc.)supports transactional updates. In this case, updates areatomic, ensuring that any read request sees either thevalue before or after an update, and not some invalidstate in between.

5.3 Avoiding deadlocks

Clients are required to lock a set of replicas to update adata object. If multiple clients attempt to update a dataobject at the same time, their lock requests may inter-leave in such a way that each client acquires the lock ona number of replicas, and gets queued for the lock of atleast one replica. Thus, no client is able to obtain a fullquorum (W votes) in order to process the write request,resulting in deadlock. The simplest way to eliminatedeadlock is to establish a total order on the servers andrequire that they be locked in this order. Since, thenumber of storage devices in most pervasive environ-ments is on the order of tens or hundreds, establishingthis total order is not difficult. In Oasis, we give eachserver a unique name, e.g., wireless_server_7, and sortbased on this name.

While this scheme ensures correctness, it also elimi-nates the parallelism that was originally available in thelock-acquisition process. In order to avoid deadlockwhile exploiting parallelism when possible, Oasis clientstrack the contention for a database. If contention is low,a client may try to acquire locks in parallel. However, ifone of the lock requests gets queued by any of theservers, the client releases the locks it had acquired andswitches to a sequential, ordered-lock acquisition.

5.4 Partial updates

In a write operation, all replicas in the write quorummust be brought up to date before the new update can beapplied. This update can be applied by either shippingthe entire database or by shipping a set of delta queriesthat will bring the server’s replica up to date. Oasisservers maintain a sliding window of recent queries, andif a partial update can be achieved using this set ofdeltas, the partial update is favored over the full update.

5.5 Permanent failures

One of the key features of our weighted voting schemeis support for disconnected operation. A replica on amobile device can be assigned the majority of the

votes to allow reads and writes to the data while thedevice is disconnected. While disconnection may bethe norm in some applications, it is important todistinguish between disconnections and permanentfailures. If a device fails, the replicas it stores becomeinaccessible and can no longer participate in quorumsfor read or write operations. To ensure that inacces-sible replicas do not render data objects unavailablefor prolonged periods of time, a lease mechanism toinvalidate replicas is necessary. Oasis assigns eachreplica a lease period and provides a mechanism toreallocate votes among replicas. We have not yetimplemented the mechanism to trigger the shift ofvotes away from a timed-out replica.

6 Applications

To investigate usability, we implemented three applica-tions on top of Oasis. Two of these are variants ofexisting applications from the literature while Guide is anew application that has been developed in our labora-tory. While we did not undertake a rigorous evaluationof our implementations, our experience suggests thatOasis is well suited for the pervasive computing domain.More interestingly, for all three applications, weencountered ways in which the capabilities of Oasistransparently augmented or extended some basic func-tion of the application.

6.1 Portrait display

The portrait display is an ongoing project in our labo-ratory, motivated by Mynatt et al.’s Digital FamilyPortrait [26]. The Digital Family Portrait tries to in-crease the awareness of extended family members aboutan elderly relative living alone. Information about anelderly person living alone (health, activity level, socialinteraction) is collected by sensors in his instrumentedhome, and are unobtrusively displayed at the remotehome of an extended family member on a digital pictureframe surrounding his portrait. Researchers in our lab-oratory have been using the digital family portrait sce-nario to explore various approaches for displayingambient data about elders who require home care. Inconjunction with their investigation, we have imple-mented a digital portrait inspired by the original thatruns on top of Oasis. The four categories of informationdisplayed in our digital portrait are medication intake,meals eaten, outings, and visits. The data used to gen-erate the display comes from a variety of sources. In ourprototype, medication and meal information are gath-ered using Mote-based sensors [12] and cameras, whileinformation about visits and outings is currently enteredby hand using a Web interface.

The relational data model provided by Oasis is wellsuited for describing the regular, structured data used bythe portrait display application. Similarly, the types of

117

Page 11: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

queries needed to extract results to display are easilyexpressed in SQL.

Oasis effectively supports the availability require-ments of the portrait display. The portrait display uses aseparate Oasis database for each category of informa-tion collected (meals, visits, etc.). Each database isexplicitly configured with a four-votes replica that re-sides on the device where the data is gathered and a one-vote replica on the portrait display device (N=5, R=3,W=3). This configuration allows the data to be readand updated at its source, and, when a connection exists,allows the portrait display to obtain recent changes.Note that this remains true even if the data source itselfis disconnected. For example, after visiting the elder, acare provider can enter notes about the visit while sittingin his car or back at his office, a practice mentioned inour fellow researchers’ interviews with care providers.While the ability to record information when discon-nected was not part of the original scenario, the capa-bility is provided by Oasis transparently by placing thefour-vote replica on the care provider’s laptop or PDA.

This configuration also supports unplanned discon-nections by the portrait display itself. The originalDigital Family Portrait used a simple client–servermodel in which the display was rendered as a Web pagefetched from a server running in the elder’s home. Whilesuitable for a prototype, it would not work well in a realdeployment in which DSL lines and modem connectionsdo, in fact, go down at times. Implementations that relyon a Web client–server model must either display anerror page or leave the display unchanged in the case ofa disconnection. With Oasis, disconnections are exposedin the form of stale query results, giving the applicationthe opportunity to display the uncertainty in an appro-priate way.

6.2 Dynamo: smart room file system

Stanford’s iRoom [13] and MIT’s Intelligent Room [5]are examples of ‘‘productivity enhanced workspaces’’ inwhich pervasive computing helps a group of peoplework more efficiently. In their scenarios, people gatherand exchange ideas while sharing and authoring filesusing a variety of viewing and authoring tools. Gener-ally, in these scenarios, either: (1) the files are stored on amachine in the workspace and users lose access whenthey leave the space; or (2) files reside on a user’s per-sonal device (like a laptop) and everyone else in theworkspace loses access when the user departs.

For our second application, we developed a systemcalled Dynamo, which allows file-oriented data to beconsistently replicated across personal devices. In Dy-namo, each user or group owns a hierarchical tree ofdirectories and files, much like a home directory. Userscan choose contexts in which to share various portions oftheir file system with other users (example contexts are‘‘code review’’ or ‘‘hiring meeting’’). The collective sumof files shared by the users that are present make up the

files available in the workspace. In this manner, everyonepresent at a hiring meeting, for example, can share theirproxies and interview notes with the other participantswithout exposing other parts of their file space.

Dynamo was written as an extension to Apache’swebDav server that stores a user’s files in an Oasisdatabase. Microsoft’s Web Folders are used to mountthe webDav server as a file system, allowing Dynamo’sfile hierarchy to be accessed using standard desktopapplications. Implementing Dynamo on top of Oasisrequired a small number of changes to the originalwebDav server (less than 400 lines). Despite this, therelational data model was not a good fit for the file-oriented data stored in Dynamo. Mapping the hierarchyof the file system into relations required a translationstep not needed in our other two applications.

The flexibility of Oasis enabled a variety of semanti-cally interesting configurations. If desired, Dynamo cancreate a one-vote replica of a user’s files on a device thatresides in the workspace. This permits the user to dis-connect and leave while enabling the remaining partici-pants to view (but not write) the files that have beenshared. These stale read-only files remain in that contextuntil the user returns, at which time a more up-to-date,writeable version would be seen. For files owned by agroup, interesting ownership policies can be arranged byassigning votes based on the user’s roles and responsi-bilities. This can be used to enforce policies ranging frombasic majority schemes in which all replicas are equal tomore complex specifications such as ‘‘the budget cannotbe edited unless the boss plus any two other employeesare present.’’ While this flexibility raises a number ofprivacy and interface challenges, it shows how Oasis canadd rich semantics to a simple application.

6.3 Guide

The Guide project [9] aims to use passive radio fre-quency identification (RFID) tags for the purpose ofcontext inference. The project involves tagging thou-sands of objects in a workspace with RFID tags andtracking their position using RF antennas mounted on arobot. Tagged objects include books, personal elec-tronics, and office/lab equipment. As the robot movesaround the environment, the antennas pick up the ID ofnearby tags. For each tag ID i discovered at time t andlocation l, the platform writes the tuple (i, t, l) to adatabase. The database thus accumulates the location ofobjects over time. The goal of Guide is to determinehigh-level relationships between objects based on theaccumulated data.

To help in our evaluation, the Guide team imple-mented their system on top of Oasis. The relational datamodel was an ideal match for Guide’s structured RFIDreadings and all of the Guide queries could be easilyexpressed as SQL statements. The indexing provided bythe underlying database was essential in reducing thetime to process Guide queries.

118

Page 12: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

Guide has demanding performance, reliability, andavailability requirements. First, it is expected to generatelarge quantities of data; the database is expected to growto contain millions of readings in months. Given that theGuide database is intended to be used as a common util-ity, it is quite possible that tens or hundreds of clients willquery the guide database. The database must, therefore,scale to support large numbers of potentially complexqueries in parallel. Second, this large quantity of datamust be stored reliably. Since the data may representmonths of activity (and it is impossible to regenerate thedata), and the entire period may well be relevant, losingthe data will be detrimental. Third, since the queries maybe part of standing tasks (such as context-triggered noti-fication), it is important that the database be highlyavailable. Based on Guide’s goals of high availability andperformance, the Oasis self-tuner configured the Guidedatabase with three one-vote replicas (N=3, R=2,W=2). This configuration provides high reliability, goodperformance, and continuous access to the data, providedthat any two of the three servers are available.

7 Performance

To measure the performance of Oasis using a realisticworkload, we constructed an experiment based on theGuide application described in Sect. 6.3. The guidedatabase is comprised of three tables: a reading tabletracking when and where an RFID tag was seen; anobject table that relates RFID tags to object names (like‘‘stapler’’); and a place table that records the geometricbounds of rooms. Our experimental data set was seededwith 1,000,000 records in the reading table, 1,000 re-cords in the object table, and 25 records in the placetable. This approximates the number of tagged objects inour laboratory and the number of readings we expect torecord in a month.

In our benchmark, a set of clients alternate betweenperforming queries and updates on the database. Thequeries are all of the form ‘‘where was object X last

seen?’’ These are fairly computationally intensive queriesthat join across the reading and place tables. The up-dates are inserts of a new record into the reading table.The ratio of queries to updates performed by the clientsis 50:1, again approximating the expected workload in aGuide deployment. To show the tradeoffs offered byOasis, we measure two Oasis configurations: one whichoffers the highest query performance (R=1, W=N) andanother which offers the highest tolerance to serverfailure (R=N/2, W=N/2+1). To show the overheadthat Oasis introduces, we compare its performance todirect accesses to the underlying PostgreSQL database.In our experiments, the number of clients is fixed at tenand the number of replicas is varied from one to six.Each client and server in the test ran on its own Pentium4 PC running Windows 2000 connected via 100 MB/sEthernet. The Oasis servers and clients ran on Sun’sJVM 1.3.1 and the underlying data was stored in Post-greSQL 7.3.

Figure 6 shows the total throughput achieved by theset of clients. The graph shows that, for a singly repli-cated database, Oasis achieves lower throughput thanPostgreSQL. This is expected since Oasis incurs addi-tional overhead from running our locking protocol. Thegraph shows that as replicas are added, read queries areable to take advantage of the increased parallelism eachnew server offers. This parallelism is greater in the high-performance configuration in which a read query can befully serviced by any one replica. For all multiple-replicaconfigurations however, Oasis achieves higher through-put than direct access to a single PostgreSQL server.

Figure 7 shows a latency breakdown for the Guidequeries executed against a two-way replicated Oasisdatabase. The breakdown shows that read queries spendmore time executing in the database than the writes. Italso shows that the Oasis overhead is higher for thewrites than the reads. (With two replicas, read opera-tions can piggyback the query on the lock request,requiring fewer messages.) This figure also suggests thatoptimizing our XML/HTTP messaging could offersubstantial performance gains.

8 Related work

There are many existing storage management systemsavailable to pervasive computing developers, including

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7

Number of replicas

Thr

ough

put (

quer

ies/

s)

Oasis configured for performanceOasis configured for fault toleranceSingle server postgreSQL

Fig. 6 This graph compares the throughput of two Oasis config-urations and a single PostgreSQL server. In the experiment, tenclients are running the Guide workload concurrently

22.7

10.3

7

17.6

05

101520253035

Read Write

Ope

ratio

n la

tenc

y (m

s)

Locking

Messaging

Query execution

Fig. 7 The latency breakdown of read and write queries in theGuide workload for Oasis configured with two replicas

119

Page 13: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

distributed file systems, databases, and tuple stores.These distributed systems exhibit a variety of behaviorswhen clients disconnect from the network. In most sys-tems, disconnected clients are unable to read or writedata, others offer limited disconnected operation [27],while some systems give clients full read and writefacilities while disconnected [15, 18, 30]. We now reviewthe storage management systems that are most relevantto Oasis and discuss how they compare.

A number of data management systems permit clientsto perform updates to a local replica data object at anytime, even when disconnected from all other replicas.These so called ‘‘update anywhere’’ systems are attrac-tive because they never deny the client application theability to write data and guarantee that the update willeventually be propagated to the other replicas. There areupdate-anywhere file systems, such as Coda [18], as wellas update-anywhere databases, such as Bayou [30] andDeno [15]. As data can be updated in multiple locationsat the same time, these systems offer weaker consistencyguarantees than Oasis. To achieve eventual consistency,update anywhere systems employ varying mechanismsto resolve conflicts that arise between replicas. Coda [18]relies on the user to merge write conflicts that cannot betrivially resolved by the system. This technique is a poorfit for pervasive computing environments where the usermay not be near a computer to provide input or may nothave the necessary level of expertise. In Bayou [30],writes are accompanied by fragments of code that travelwith the write request and are consulted to resolveconflicts. While these migrating, application-specificconflict resolvers are a potentially powerful model, webelieve that writing them is far beyond the technicalabilities of an average software engineer. Finally, Deno[15] uses rollback to resolve conflicts between replicas.Rollback is difficult to cope with in pervasive computingenvironment in which physical actuations take place thatcannot be undone.

While P2P file sharing systems like Gnutella satisfy anumber of our requirements, they do not provide asingle consistent view of the data as servers connect anddisconnect. Systems like Farsite [4], OceanStore [19],and CFS [8] improve on the basic P2P architecture byincorporating replication to probabilistically ensure asingle consistent view. While these systems share ourgoals of availability and manageability, there are sig-nificant differences that make them less than ideal forpervasive computing environments. Farsite was de-signed for a network of PCs running desktop applica-tions. OceanStore is geared for global-scale deploymentand depends on a set of trusted servers. Finally, CFSprovides read-only access to clients and is not intendedas a general-purpose file system.

A few storage systems have been designed specificallyfor pervasive computing environments. The TinyDBsystem [25] allows queries to be routed and distributedwithin a network of impoverished sensor nodes. Systemslike PalmOS allow PDA users to manually synchronizetheir data with a desktop computer. TSpaces [24] is one

of a number of centralized tuple-based systems that waswritten for environments with a changing set of heter-ogeneous devices.

Self-tuning has been incorporated into several storagesystems outside the domain of pervasive computing. HPAutoRaid [33] automatically manages the migration ofdata between two different levels of RAID arrays asaccess patterns change. Similarly, Hippodrome [2]employs an iterative approach to automating storagesystem configuration.

9 Conclusions and future work

It is challenging to write responsive and robust pervasivecomputing applications using traditional data manage-ment systems. To help address this issue, we have builtOasis, a data management system tailored to the char-acteristics of pervasive computing. Oasis presents a SQL(structure query language) interface and a relationaldata model, both of which are well suited to the datausage of typical pervasive computing applications. Apeer-to-peer (P2P) architecture coupled with a decen-tralized weighted-voting scheme provides sequentiallyconsistent access to data while tolerating device dis-connections. We have validated our initial implementa-tion by showing that it exhibits good performance, andused Oasis to implement three typical pervasive com-puting applications. Our initial experience with Oasissuggests that its feature set fits both the functional andperformance requirements of such applications.

The largest practical drawback of using Oasis is therequirement that databases be fully replicated. A num-ber of partial replication alternatives exist that allowdata to be replicated at the table, record, or SQL viewgranularity. We plan to investigate which style of repli-cation best fits pervasive computing and what changesour APIs and consistency mechanism would require inorder to support various partial replication schemes.

We also plan to investigate how machine learningtechniques can be used to guide the placement of repli-cas, the creation of indexes and the adjustment of theweighted-voting parameters. By tracing application datausage and device migration over time, we hope to build asystem that can make good automated decisions onwhere data should be placed, as well as how it should beindexed. We believe that large gains in both availabilityand query performance can be attained by exploiting theflexibility offered by weighted voting to create a self-tuning, self-managing storage system.

References

1. Abowd GD, Atkeson CG, Feinstein A, Hmelo C, Kooper R,Long S, Sawhney N, Tani M (1996) Teaching and learning asmultimedia authoring: the classroom 2000 project. In: Pro-ceedings of the 4th ACM international conference on multi-media (MM’96), Boston, Massachusetts, November 1996,pp 187–198

120

Page 14: Oasis: an architecture for simplified data management and disconnected operationlamarca/pubs/puc_oasis.pdf · 2010. 10. 7. · Section 3 presents the main components of the Oasis

2. Anderson E, Hobbs M, Keeton K, Spence S, Uysal M, VeitchA (2002) Hippodrome: running circles around storage admin-istration. In: Proceedings of the first USENIX conference onfile and storage technologies (FAST 2002), Monterey, Cali-fornia, January 2002

3. Arnstein L, Sigurdsson S, Franza R (2001) Ubiquitous com-puting in the biology laboratory. J Lab Automation 6(1)

4. Bolosky W, Douceur J, Ely D, Theimer M (2000) Feasibilityof a serverless distributed file system deployed on an existingset of desktop PCs. In: Proceedings of the ACM interna-tional conference on measurement and modeling of computersystems (SIGMETRICS 2000), Santa Clara, California, June2000

5. Brooks R (1997) The Intelligent Room project. In: Proceedingsof the 2nd international conference on cognitive technology(CT’97), Aizu, Japan, August 1997

6. Brumitt B, Meyers B, Krumm J, Kern A, Shafer S (2000)EasyLiving: technologies for intelligent environments. In:Proceedings of the 2nd international symposium on handheldand ubiquitous computing (HUC2K), Bristol, UK, September2000, pp 12–29

7. Card SK, Robertson GG, Mackinlay JD (1991) The informa-tion visualizer: an information workspace. In: Proceedings ofthe ACM conference on human factors in computing systems(CHI’91), New Orleans, Louisiana, April/May 19991, pp 181–188

8. Dabek F, Kaashoek MF, Karger D, Morris R, Stoica I (2001)Wide-area cooperative storage with CFS. In: Proceedings of the18th ACM symposium on operating systems principles (SOSP2001), Banff, Canada, October 2001

9. Fishkin KP, Fox D, Kautz H, Patterson D, Perkowitz M,Philipose M (2003) Guide: towards understanding daily life viaauto-identification and statistical analysis. In: Proceedings ofthe 2nd international workshop on ubiquitous computing forpervasive healthcare applications (Ubihealth 2003), Seattle,Washington, October 2003

10. Gifford DK (1979) Weighted voting for replicated data. In:Proceedings of the 7th symposium on operating systems prin-ciples (SOSP’79), Pacific Grove, California, December 1979,pp 150–162

11. Goodman N, Skeen D, Chan A, Dayal U, Fox S, Ries D (1983)A recovery algorithm for a distributed database system. In:Proceedings of the 2nd ACM SIGACT–SIGMOD symposiumon principles of database systems (PODS’83), Atlanta, Geor-gia, March 1983

12. Hill J, Szewcyk R, Woo A, Culler D, Hollar S, Pister K(2000) System architecture directions for networked sensors.In: Proceedings of the 9th international conference onarchitectural support for programming languages and oper-ating systems (ASPLOS 2000), Cambridge, Massachusetts,November 2000

13. Johanson B, Fox A, Winograd T (2002) The InteractiveWorkspaces project: experiences with ubiquitous computingrooms. IEEE Pervasive Comput Mag 1(2):67–74

14. Johanson B, Fox A (2002) The event heap: a coordinationinfrastructure for interactive workspaces. In: Proceedings of the4th IEEE workshop on mobile computing systems and appli-cations (WMCSA 2002), Callicoon, New York, June 2002

15. Keleher P (1999) Decentralized replicated-object protocols. In:Proceedings of the 18th ACM symposium on principles ofdistributed computing (PODC’99), Atlanta, Georgia, May1999, pp 143–151

16. Kidd C, Orr R, Abowd GD, Atkeson CG, Essa IA, MacIntyreB, Mynatt E, Starner TE, Newstetter W (1999) The AwareHome: a living laboratory for ubiquitous computing research.In: Proceedings of the 2nd international workshop on cooper-ative buildings (CoBuild’99), Pittsburgh, Pennsylvania, Octo-ber 1999

17. Kindberg T, Fox A (2002) System software for ubiquitouscomputing. IEEE Pervasive Comput 1(1):70–81

18. Kistler J, Satyanarayanan M (1992) Disconnected operation inthe coda file system. ACM Trans Comput Syst 10:213–225

19. Kubiatowicz J, Bindel D, Chen Y, Czerwinski S, Eaton P,Geels D, Gummadi R, Rhea S, Weatherspoon H, Weimer W,Wells C, Zhao B (2000) OceanStore: an architecture for global-scale persistent storage. In: Proceedings of the 9th internationalconference on architectural support for programming lan-guages and operating systems (ASPLOS 2000), Cambridge,Massachusetts, November 2000

20. LaMarca A, Brunette W, Koizumi D, Lease M, Sigurdsson S,Sikorski K, Fox D, Borriello G (2002) PlantCare: an investi-gation in practical ubiquitous systems. In: Proceedings of the4th international conference on ubiquitous computing (Ubi-Comp 2002), Goteborg, Sweden, September/October 2002, pp316–332

21. Lamming M, Flynn M (1994) Forget-me-not: intimate com-puting in support of human memory. In: Proceedings of theinternational symposium on next generation human interface(FRIEND21), Meguro Gajoen, Japan, February 1994

22. Lamming M, Eldridge M, Flynn M, Jones C, Pendlebury D(2000) Satchel: providing access to any document, any time,anywhere. ACM Trans Comput Hum Interact 7(3):322–352

23. Lamport L (1979) How to make a multiprocessor computerthat correctly executes multiprocessor programs. IEEE TransComput 28(9):690–691

24. Lehman TJ, McLaughry SW, Wyckoff P (1999) Tspaces: thenext wave. In: Proceedings of the 32nd annual Hawaii inter-national conference on system sciences (HICSS-32), Maui,Hawaii, January 1999

25. Madden S, Frenklin M, Hellerstein J, Hong W (2003) Thedesign of an acquisitional query processor for sensor networks.In: Proceedings of the 2003 ACM SIGMOD/PODS interna-tional conference on management of data, San Diego, Cali-fornia, June 2003

26. Mynatt E, Rowan J, Craighill S, Jacobs A (2001) Digital familyportraits: providing peace of mind for extended family mem-bers. In: Proceedings of the ACM conference on human factorsin computing systems (CHI 2001), Seattle, Washington, March/April 2001, pp 333–340

27. Oracle Technology Network (2002) Oracle9i Lite developersguide for Windows CE, release 5.0.1. Available at http://www.oracle.com/technology/documentation/ora-cle9i_arch_901.html

28. Sumi Y, Mase K (2001) Digital system for supporting confer-ence participants: an attempt to combine mobile, ubiquitousand web computing. In: Proceedings of the 3rd internationalconference on ubiquitous computing (UbiComp 2001), Atlan-ta, Georgia, September/October 2001

29. Stoica I, Morris R, Karger D, Kaashoek F, Balakrishnan H(2001) Chord: a scalable peer-to-peer lookup service for inter-net applications. In: Proceedings of the 2001 annual ACMSIGCOMM conference, San Diego, California, August 2001,pp 149–160

30. Terry D, Theimer M, Petersen K, Demers A, Spreitzer M,Hauser C (1995) Managing update conflicts in Bayou, a weaklyconnected replicated storage system. In: Proceedings of the15th ACM symposium on operating systems principles (SOSP-15), Copper Mountain Resort, Colorado, December 1995, pp172–183

31. Want R, Pering T, Danneels G, Kumar M, Sundar M, Light J(2002) The personal server: changing the way we think aboutubiquitous computing. In: Proceedings of the 4th internationalconference on ubiquitous computing (UbiComp 2002), Gote-borg, Sweden, September/October 2002

32. Weiser M (1991) The computer for the twenty-first century. SciAm 1496:94–100

33. Wilkes J, Golding R, Staelin C, Sullivan T (1996) The HPautoRAID hierarchical storage system. ACM Trans ComputSyst 14(1):108–136

34. Yang B, Garcia-Molina H (2002) Designing a super-peer net-work. Technical report, Stanford University

35. IBM (2003) Autonomic computing manifesto. Available athttp://www.research.ibm.com/autonomic/manifesto/auto-nomic_computing.pdf

121