Building a collaborative peer-to-peer wiki system on a structured overlay

Embed Size (px)

Text of Building a collaborative peer-to-peer wiki system on a structured overlay

  • r w

    li a,Francili, SpaINRIA,

    Keywords:Collaborative editingPeer-to-peerDistributed interceptionOptimistic replication

    eques

    are still no approaches that can ensure both scalability and consistency for the case of

    we propose to distribute updates on this content and man-age collaborative editing on top of such a P2P network. Weare convinced that, if we can deploy a group editor frame-

    daily [42]. To handle this load, Wikipedia needs a costlyinfrastructure [3], for which hundreds of thousands of dol-lars are spent every year [11]. A P2P massive collaborativeediting system would allow distributing the service andshare the cost of the underlying infrastructure.

    Existing approaches to deploy a collaborative system ona distributed network include Wooki [38], DistriWiki[20], RepliWiki [29], Scalaris [32], Distributed Version

    1389-1286/$ - see front matter 2010 Elsevier B.V. All rights reserved.

    * Corresponding author.E-mail addresses: oster@loria.fr (G. Oster), ruben.mondejar@urv.cat (R.

    Mondjar), molli@loria.fr (P. Molli), sergiu@xwiki.com (S. Dumitriu).

    Computer Networks 54 (2010) 19391952

    Contents lists available at ScienceDirect

    Computer N

    journal homepage: www.elsdoi:10.1016/j.comnet.2010.03.019Peer-to-peer (P2P) systems, which account for a signif-icant part of all internet trafc, rely on content replicationat more than one node to ensure scalable distribution. Thisapproach can be seen as a very large distributed storagesystem, which has many advantages, such as resilience tocensorship, high availability and virtually unlimited stor-age space [2].

    Currently, P2P networks mainly distribute immutablecontents. We aim at making use of their characteristicsfor distributing dynamic, editable content. More precisely,

    applications, such as CVS and wikis, can be redeployedon P2P networks, and thus benet from the availabilityimprovements, the performance enhancements and thecensorship resilience of P2P networks.

    Our architecture targets heavy-load systems, that mustserve a huge number of requests. An illustrative example isWikipedia [40], the collaborative encyclopdia that hascollected, until now, over 13,200,000 articles in more than260 languages. It currently registers at least 350 millionpage requests per day, and over 300,000 changes are made1. Introductionhighly dynamic content, such as the data managed inside wikis. We propose a peer-to-peersolution for distributing and managing dynamic content, that combines two widely studiedtechnologies: Distributed Hash Tables (DHT) and optimistic replication. In our universalwiki engine architecture (UniWiki), on top of a reliable, inexpensive and consistentDHT-based storage, any number of front-ends can be added, ensuring both read and writescalability, as well as suitability for large-scale scenarios.The implementation is based on Damon, a distributed AOP middleware, thus separating

    distribution, replication, and consistency responsibilities, and also making our systemtransparently usable by third party wiki engines. Finally, UniWiki has been proved viableand fairly efcient in large-scale scenarios.

    2010 Elsevier B.V. All rights reserved.

    work on a P2P network, we open the way for P2P contentediting: a wide range of existing collaborative editingAvailable online 4 June 2010 architectures providing high storage capacity, data availability and good performance.While many simple solutions for scalable distribution of quasi-static content exist, thereBuilding a collaborative peer-to-pee

    Grald Oster a, Rubn Mondjar b,*, Pascal Mola Score Team, Universit de Lorraine, Universit Henri Poincar, INRIA, LORIA,bDepartment of Computer Science and Mathematics, Universitat Rovira i VirgcXWiki SAS & Score Team, Universit de Lorraine, Universit Henri Poincar,

    a r t i c l e i n f o

    Article history:

    a b s t r a c t

    The ever growing riki system on a structured overlay

    Sergiu Dumitriu c

    einLORIA, France

    t for digital information raises the need for content distribution

    etworks

    evier .com/ locate/comnet

  • 1940 G. Oster et al. / Computer Networks 54 (2010) 19391952Control systems [1,13,43], DTWiki [8] and Piki [21]. Severaldrawback prevent these systems from being used in ourtarget scenario. They either require total replication of con-tent, requiring that all wiki pages are replicated at allnodes, or do not provide support for all the features of awiki system such as page revisions, or provide only a basicconict resolution mechanism that is not suitable for col-laborative authoring.

    This paper presents the design and the rst experimen-tations of a wiki architecture that:

    is able to store huge amounts of data, runs on commodity hardware by making use of P2Pnetworks,

    does not have any single point of failure, or even a rel-atively small set of points of failure,

    is able to handle concurrent updates, ensuring eventualconsistency.

    To achieve these objectives, our system relies on theresults of two intensively studied research domains, dis-tributed hash tables [30] (DHT) and optimistic replication[31]. At the storage system level, we use DHTs, whichhave been proved [16] as quasi-reliable even in test caseswith a high degree of churning and network failures.However, DHTs alone are not designed for supportingconsistently unstable content, with a high rate of modi-cations, as it is the case with the content of a wiki. There-fore, instead of the actual data, our system stores in eachDHT node operations, more precisely the list of changesthat produce the current version of a wiki document. Itis safe to consider these changes as the usual static datastored in DHTs, given that an operation is stored in a nodeindependently of other operations, and no actual modi-cations are performed on it. Because the updates can orig-inate in various sources, concurrent changes of the samedata might occur, and therefore different operation listscould be temporarily available at different nodes respon-sible for the same data. These changes need to be com-bined such that a plausible most recent version of thecontent is obtained. For this purpose, our system usesan optimistic consistency maintenance algorithm, suchas WOOT [23] or Logoot [39], which guarantees eventualconsistency, causal consistency and intention preserva-tion [35].

    To reduce the effort needed for the implementation, andto make our work available to existing wiki applications,we built our system using a distributed AOP middleware(Damon http://damon.sf.net). Thus, we were able to reuseexisting implementations for all the components needed,and integrate our method transparently.

    Section 2 presents approaches related to our proposi-tion and analyzes their strong and weak points. In Section3, an overview of DHT characteristics and optimistic con-sistency maintenance algorithms is presented. The paperfurther describes, in Section 4, the architecture of the Uni-Wiki system and its algorithms. An implementation of thissystem is presented in Section 5. In Section 6, a validationvia experimentation on a large-scale scenario is demon-strated. The paper concludes in Section 7.2. Related work

    A lot of systems such as Coral [12], CoDeeN [37] orGlobule [24] have been proposed during the last year toaddress the issue of web hosting on peer-to-peer networks.All these systems belong to the category of Content Deliv-ery Network (CDN). Such systems offers high availability ofdata stored on a peer-to-peer network which is achievedby using simple replication techniques combining withcaching mechanisms. The main drawbacks of theses ap-proaches is that they consider only static content whilewe are interested in storing dynamic content which mightbe edited concurrently.

    One of the most relevant architectural proposals in theeld of large-scale collaborative platforms is a semi-decen-tralized system for hosting wiki web sites like Wikipedia,using a collaborative approach. This design focuses on dis-tributing the pages that compose the wiki across a networkof nodes provided by individuals and organizations willingto collaborate in hosting the wiki. The paper [36] presentsalgorithms for page placement so that the capacity of thenodes is not exceeded and the load is balanced, and algo-rithms for routing client requests to the appropriate nodes.

    In this architecture, only the storage of content wikipages is fully distributed. Client requests (both read andwrite) are handled by a subset of trusted nodes, andmeta-functionalities, such as user pages, searching, accesscontrols and special pages, are still dealt with in a central-ized manner.

    While this system can resist random node departuresand failures (one of the inherent advantages of replication),it does not handle more advanced failures, such as parti-tioning. Moreover, when concurrent updates of the samecontent (page) happen, a simple tie breaking rule is usedto handle conicts. Such an approach guarantees consis-tency but it does not fulll wiki users expectations (someof the user contributions will be ignored by the systemwhen the tie-breaking policy is applied).

    Approaches that relate more to the P2P model include:

    DistriWiki [20], based on the JXTA [14] protocol, pro-vides a P2P wiki architecture, where each node repre-sents a set of wiki pages stored on a users computer.It concentrates on the communication aspect, butignores wiki specic features, such as versioning, anddoes not solve the problem of merging concurrent con-tributions from different users.

    DTWiki [8] uses a delay tolerant network (DTN) [10] asthe basis for building a distributed and replicated stor-age mechanism, enabling ofine work and access fromterminals with intermittent internet access.

    Piki [21] adapts Key Based Routing [5] and DistributedHash Tables [30] (DHTs) t

Recommended

View more >