Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

Embed Size (px)

Citation preview

  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    1/15

    Replica Placement for Route Diversity inTree-Based Routing Distributed Hash Tables

    Cyrus Harvesf and Douglas M. Blough, Senior Member , IEEE

    Abstract Distributed hash tables (DHTs) share storage and routing responsibility among all nodes in a peer-to-peer network. Thesenetworks have bounded path length unlike unstructured networks. Unfortunately, nodes can deny access to keys or misroute lookups.We address both of these problems through replica placement. We characterize tree-based routing DHTs and define M AXDISJOINT , areplica placement that creates route diversity for these DHTs. We prove that this placement creates disjoint routes and find thereplication degree necessary to produce a desired number of disjoint routes. Using simulations of Pastry (a tree-based routing DHT),we evaluate the impact of M AXDISJOINT on routing robustness compared to other placements when nodes are compromised atrandom or in a contiguous run. Furthermore, we consider another route diversity mechanism that we call neighbor set routing and showthat, when used with our replica placement, it can successfully route messages to a correct replica even with a quarter of the nodes inthe system compromised at random. Finally, we demonstrate a family of replica query strategies that can trade off response time andsystem load. We present a hybrid query strategy that keeps response time low without producing too high a load.

    Index Terms Distributed systems, peer-to-peer networks, distributed hash tables, routing, replica placement, robustness.

    1 INTRODUCTION

    P EER-TO-PEER (p2p) networks are a popular substrate for building distributed applications because of their effi-ciency, scalability, resilience to failure, and ability to self-organize. The p2p architecture relies on the distribution of responsibility among hundreds of thousands, i f not millions,of nodes in the network. Therefore, if a small s et of nodes failto serve data objects, properly maintain routin g information,or route messages, the integrity of a very larg e-scale systemmay be compromised.

    The efficiency of lookups has become a ce ntral focus of

    p2p design because many popular applicatio ns, like nameresolution, publish-subscribe, and IP commu nication, relyon a lookup service as a core functionality. A p2p distributedhash table (DHT) may be used to provide this functionality.DHTs [25], [26], [33], [34] structure the network topology in away that enables routing algorithms to produce lookuppaths of bounded length (typically Olog N ).

    Unfortunately, when deployed over the Internet, DHTsmay be impacted by the failure or compromise of peers inthe overlay and performance guarantees no longer hold. Infact, it may not be possible to fetch a desired object at all.Many p2p networks allow nodes to join without prejudice,leaving the network vulnerable to attack. Furthermore, thenetwork could face coordinated attacks from competitors orother groups that have an interest in the failure of thenetwork. This type of coordinated attack behavior has been

    reported, for example, i n p2p file sharing systems. DHTsare inherently less resili ent to these attacks than unstruc-tured networks because unstructured networks typically broadcast messages, wh ich is a more robust (and muchmore expensive) mechan ism.

    Sit and Morris [31] c lassify attacks on DHTs into threecategories:

    1. storage and ret rieval attacks, which target themanner in which peers manage data items;

    2. routing attacks, which target the manner in whichpeers route mess ages; and

    3. miscellaneous attacks, which target other aspects of the system, such as admission control or the under-lying network routing service.

    The first class of attack is commonly addressed withreplication. Objects are replicated at several peers in thenetwork to increase the likelihood that there will be acorrect replica available. The benefits of replication on load balancing and overall performance have also been studied.To our knowledge, ours is the first work that considers howthe placement of replicas affects object reachability throughthe routing infrastructure.

    Numerous works [2], [3], [32] have relied on routediversity to mitigate the effects of routing attacks. Srivatsaand Liu [32] introduced the notion of independent lookup paths to improve routing robustness. Two paths are said to be independent if they share no hops other than the sourceand destination peers. It is worth noting that route diversityhas benefits in addition to improving routing robustness.For example, diverse routes can be used to improve load balance and fairness or to circumnavigate congested areasof the network.

    Our work realizes the benefits of replication and routediversity in concert through replica placement. In thispaper, we consider a class of DHTs that route messagesusing a scheme which we call tree-based routing. We show

    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011 419

    . C. Harvesf is with Microsoft Corporation, 1 Microsoft Way, Redmond, WA98052. E-mail: [email protected].

    . D.M. Blough is with the Department of Electrical and ComputerEngineering, Georgia Institute of Technology, KACB, Room 3356, Atlanta,GA 30332-0765. E-mail: [email protected].

    Manuscript received 14 Oct. 2008; revised 24 June 2009; accepted 22 Sept.2009; published online 4 Dec. 2009.Recommended for acceptance by A. Schiper.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TDSC-2008-10-0158.Digital Object Identifier no. 10.1109/TDSC.2009.49.

    1545-5971/11/$26.00 2011 IEEE Published by the IEEE Computer Society

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    2/15

    that there exists a replica placement, which we callMAXDISJOINT, that creates disjoint routes in DHTs of thistype. We prescribe the number and placement of replicasnecessary to produce d disjoint routes from any sourcenode to the replica set. With this scheme, we are able totolerate d 1 malicious peers, whether they are attackingthe storage and retrieval of data items or the routinginfrastructure. Our approach is targeted specifically at

    DHT-based (structured) p2p systems with a multilevelrouting structure. So-called one-hop DHTs [14], [15] arediscussed in Section 6.

    In order for disjoint routes to improve the robustness of the system, the client must be able to verify the integrity of data items. This is necessary for the client to detect when amalicious peer has tampered with the result of a data itemlookup. Therefore, we assume that data in the system areself-certifying. This assumption is quite common in peer-to-peer systems [3], [7], [27] and is discussed in more detail inSection 6.

    Using Pastry as an example, we evaluate M AXDISJOINTthrough simulation and show that a DHT with typicalconfiguration parameters can benefit from our replica

    placement. Our experiments show that wi th only eightreplicas and a quarter of nodes compromised at random, anode can find a route, which consists of on ly uncompro-mised nodes, to a correct replica with greater t han 97 percentprobability. M AXDISJOINT also tolerates run s of compro-mised nodes; with 16 replicas and 85 percen t of the DHTcompromised in a run, lookups can be resolve d with greaterthan 96 percent probability. Furthermore,we u se a techniquewhich we call neighbor set routing to increase r oute diversityand improve the probability of lookup success . For example,a lookup performed with neighbor set routing and M AXDIS- JOINT placement can be resolved successfull y with greaterthan 97 percent probability with 40 perc ent of nodescompromised at random. Finally, we demon strate that thestrategy used to query replicas can have a sign ificant impacton performance and we propose a hybrid quer y strategy thatcan beusedto trade off response timeand system load for the best performance.

    2 RELATED WORKTo place our work in context, we discuss related work onreplica placement, peer-to-peer routing security, and gen-eral peer-to-peer security issues.

    2.1 Replica PlacementReplica placement has long been studied in the realm of distributed computing. Many studies have compared theperformance of different placement schemes in terms of quality of service, availability, and time to recovery indifferent types of serverless systems [6], [9], [19], [22]. Thefirst DHT-based replication schemes were only concernedwith availability and thus local replication, i.e., replicasplaced close to the master copy in the ID space, was used[26], [33]. As detailed herein, such placements have verylittle routing robustness.

    A very important paper, which proposed the firstdeterministic nonlocal replica placement scheme forDHT-based systems, was that of Ghodsi et al. [12]. Thispaper discusseda setof symmetric replica placement schemesthat, for a replication degree of d , divide the ID space into

    equivalence classes, each of size d . If an object with its ID in aparticular equivalence class is replicated, replicas are placedat all IDs in the class. This is a very general definition, whichis completely independent of routing. Thus, for a particularDHT structure, some such schemes could produce a largenumber of disjoint routes while others might produce veryfew. To realize benefits of this approach for routing security,it is therefore necessary to instantiate particular schemes for

    different DHTs or classes of DHTs and evaluate their routingcharacteristics. This is exactly the problem considered in thispaper. The instantiation of symmetric replication that waspresented in [12] was equally spaced replication. The papercontained a thorough evaluation, which showed that thetechnique reduces the message overhead in node joins andleaves, provides better load balance, and improves faulttolerance. However, routing robustness was not considered.It is worth noting that the M AXDISJOINT placement isequivalent to equally spaced replication in Chord. In otherDHT implementations, however, M AXDISJOINT providesadded flexibility in terms of tuning routing robustness thatequally spaced replication does not provide.

    Our prior work is the first that considers the impact of replica placement on rou ting robustness in DHTs. Our initialwork focused on the ben efits of equally spaced placement inChord [16] and has e xpanded into the M AXDISJOINTplacement [17], a genera l solution that gives the benefits of equally spaced placemen t in Chord to all DHTs that employa tree-based routing sche me.

    2.2 Peer-to-Peer Rou ting SecurityA number of works tha t look to improve routing securityare centered around th e notion of route diversity. Forexample, Artigas et al. propose Cyclone, an equivalence- based routing scheme de ployed over an existing structuredpeer-to-peer overlay [2 ]. Independent lookup paths are

    created by routing acr oss different equivalence classes.Since the paths are ind ependent and do not differ in thedestination, Cyclone does not naturally mitigate the effectsof storage and retrieval attacks. Furthermore, each peer isrequired to maintain additional routing information, whichincurs overhead. In contrast, our replica placement createsdisjoint routes without requiring any additional routingstate or modifying the underlying routing scheme.

    Portmann et al. use route diversity to provide messageconfidentiality [24]. Messages are split in two, encrypted,and sent to the destination across diverse paths. Routediversity is created by routing messages through therouting table entries that minimize route overlap. Theappropriate entries are chosen using empirical results thatdepend on the network size. They show that, in the bestcase, this method results in an average path overlap of 15-25 percent between a pair of routes. In other words, at best, 15 percent of the routes will be common to bothpaths. We show analytically that our placement createsmultiple nonoverlapping routes.

    Castro et al. combine secure node identifier assignment,secure routing table maintenance, and secure messageforwarding to create a secure routing primitive [3]. Securenode identifier assignment ensures that an adversary cannottake arbitrary identifiers. It also ensures a uniform distribu-tion of compromised nodes. Secure routing table mainte-nance ensures that the average fraction of compromised

    420 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    3/15

    routing table entries does not exceed the fraction of compromised nodes in the network.

    Secure message forwarding guarantees that a messagesent to a key is delivered to all of its replicas with highprobability. This component most closely resembles ourwork. It relies on route diversity to deliver messages to theneighborhood of destination. Unlike our approach, thisiterative redundant routing scheme requires modification to

    therouting infrastructure of theDHT. Onesuch modificationis the forwarding of messages through the neighbors of thesource node, which we call neighbor set routing. We will showthat neighbor set routing is useful in creating route diversityand can improve on the benefit of our replica placement.

    Mickens and Noble develop a framework for diagnosing broken overlay routes, whether they result from IP-levellink failures or malicious peers in the overlay [21]. Once thecause of the broken route is detected, the IP-level link iscircumnavigated or the malicious peer is excluded from thesystem. Rather than excluding peers that may have beenfalsely diagnosed as malicious, we use route diversity toavoid faulty nodes.

    It is worth noting that path disjointedn ess has beenconsidered in other contexts as well. Castro et al. propose ap2p multicast system for high bandwidth content (e.g.,streaming video) [5]. Content is split into stri pes such thatthe quality of the content improves with t he number of stripes downloaded simultaneously. The stri pes are deliv-ered to subscribers via multicast trees. To e nsure that thefailure of a single node does not compromise all stripes, thetrees are constructed to be inner node disj oint. In otherwords, no node will be an inner node of m ore than onemulticast tree. Therefore, if a single node fails, no more thanone stripe is lost as a result. Inner node disj oint trees areconstructed by selecting roots that vary in ter ms of the nodeprefixes. M AXDISJOINT creates disjoint paths in much the

    same way in Pastry.2.3 General Issues in Peer-to-Peer Network SecurityConsistent node identity is critical to key placement androuting. Most structured networks place a key at the nodewith identifier closest to the key identifier. If nodes do notmaintain a single, consistent identity, key placement can becompromised. Furthermore, an adversary that can takemultiple identities has the ability to partition the network[8], [30]. Secure admission control protocols are necessary tolimit the number of identities an entity can obtain [28].Other works have implemented node identities in acensorship-resistant, anonymous fashion [10], [29].

    Nodes must also be able to store keys fairly in a mannerthat allows for verification. A number of works have aimedto create self-certifying data. CFS [7] uses the cryptographichash of a data item as its key. PAST [27] relies oncryptographic signatures to verify data integrity. Analternative is to use replication and Byzantine-fault-tolerantalgorithms [4] to maintain data consistency and correctness.Self-certifying data are discussed in more detail in Section 6.

    3 DISTRIBUTED HASH TABLES WITH TREE -BASEDROUTING

    Distributed hash tables are often referenced by theirgeometry, i.e., ring (Chord [33]), torus (CAN [25]), tree

    (Plaxton [23]), or some hybrid (Pastry [26], Tapestry [34]).The geometry impacts neighbor and route selection, whichcan have an impact on flexibility, resilience, and proximityperformance as studied in [13]. Although we use the termtree-based routing, we are not referring to the geometry, but the routing algorithm. Tree-based routing algorithmshave specific properties that we define herein.

    3.1 Tree-Based Routing DHTsConsider a DHT with an ordered id space I with size N

    jI jand a branching factor B such that log B N is integral. The branching factor is used by each node to construct itsrouting table. The routing table of a node u has thefollowing properties:

    1. The node u partitions the entire id space intocontiguous segments and selects one node fromeach id segment to include in its routing table. Thepartitioning is performed as follows:

    a. The node u partitions the id space into B equalsize contiguous parts.

    b. Of the B id se gments, u selects the id segment I 0of which it is a member.

    c. Steps 1aand 1 b are repeated to repartition I 0untilpartsofsizeo nearecreated.The part that consistsofthenode u is discarded (thereis no needfor u tomaintain a ro uting table entry to itself).

    d. At the end o f the partitioning process, u willhave created B 1log B N contiguous parts of the id space, with sizes N B ;

    N B 2 ; . . . ;

    N B logB N 1 ; and 1 ,

    and with B 1 parts of each size.2. For each part P , u selects a node v 2P that covers P and places it in i ts routing table. A node is said to

    cover a part P ifits routing table contains k > 1 entries

    that cover the n onempty parts P 1; P 2; . . . ; P k andP P 1 [ P 2 [ [ P k. By definition, a node u is saidto cover the part consisting of its id. This definition,combined withpartitioning of P into nonempty parts,ensures that the recursive definition of coverageterminates.

    The partitioning and routing table construction is showngraphically in Fig. 1.

    Note that tree-based routing DHT implementationsdiffer in the manner in which Property 2 is satisfied. Forexample, in Chord [33], since routing is performed in theclockwise direction, the most counterclockwise node in eachpart must be selected because it is the only node that coversthe entire id segment. In prefix-matching routing DHTs, allof the nodes within a part share a common prefix and coverthe entire id segment; therefore, any node within the partmay be chosen as a routing table entry. This explains theflexibility in choosing routing table entries in DHTs likePastry [26] and Tapestry [34].

    Routing is performed by forwarding the messagedestined for the id d to the entry that covers the id segmentthat contains d . We define a DHT constructed in thismanner to be a tree-based routing DHT. If the paths from anysource node to all possible destinations are aggregated, theresulting topology is a tree, which is how tree-based routinggets its name. Note that many popular DHT implementa-tions [20], [23], [26], [33], [34] exhibit these properties and,

    HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 421

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    4/15

    therefore, employ a tree-based routing scheme . These DHTshave a number of useful properties, which we prove below.

    First, tree-based routing DHTs are determ inistic; that is,given a message destined for a node d , each node has oneand only one routing table entry throug h which themessage can be forwarded to d .Lemma 1 (Determinism). For the routing table o f any node in a

    tree-based routing DHT, if the entries e1 and e2 cover idsegments I 1 and I 2 , respectively, then I 1 \ I 2 ;.

    Proof. This follows naturally from the partit ioning of theid space. tuRoutes in tree-based routing DHTs are guaranteed to

    converge. This property holds when the DHT is full1

    ; that is,every possible id is represented by a node.Lemma 2 (Routing Convergence). Consider the route in a full

    tree-based routing DHT from a source nodes to a destination d represented as a series of nodesn1 s; n 2; . . . ; nk1; nk d such that n i is some entry e j from the routing table of noden i1 for i > 1. Suppose that n 1; n2; n3; . . . ; nk cover the idsegments I 1 I; I 2; I 3; . . . ; I k.

    2 Then,

    I k fd g& I k1 & I k2 & &I 2 & I 1:Proof. Consider the hop from n j to n j 1 . Since the node n j

    covers the id segment I j , it must partition I j into B equalsized id segments. Since n j 1 covers I j 1 , which is one of the parts of I j , then I j 1 & I j . Furthermore, since theDHT is full, the node nk1 has a part that contains onlythe destination d . tuFurthermore, it is possible to bound the number of hops

    in every route; we state this formally below.Lemma 3 (Bounded Path Length). Any path in a full tree-

    based routing DHT has at most logB N hops.

    Proof. Since the id segment covered by each hop must be asubset of the id segment covered by the previous hop,the minimum ratio between the size of id segmentscovered by consecutive hops is the branching factor B .Therefore, the longest path repeatedly divides N by B with each hop until it reaches the destination. Thisrequires logB N hops. tu

    3.2 Creating Disjoint Routes with Tree-BasedRouting

    Determinism and routing convergence provide a naturalavenue for creating disjoint routes. Routing convergenceguarantees that once a path enters a segment of the id space,it will never proceed to a node that is outside of that segment.If two paths can be created originating in different segments,then the paths are guaranteed to be disjoint. Furthermore,the determinism property ensures that any two routing tableentries will route to different segments. Therefore, we cancreate disjoint routes simply by routing through differentrouting table entries. This is stated formally in the followinglemma.Lemma 4. In a full tree-based routing DHT, routes originating at

    a common source nodewith different first hops are disjoint.Proof. Suppose two rout es originating at a common source

    node n have first hop s e1 and e2 , such that e1 and e2 aredifferent routing tabl e entries of n . Using the determin-ism property, if e1 an d e2 cover id segments I 1 and I 2 ,respectively, then I 1 \ I 2 ;.Consider any hops h1 and h2 in the routes beginningwith first hops e1 and e2 , respectively. Suppose hops h1and h2 cover the id segments I 01 and I 02 , respectively.Using the routing c onvergence property, I 01 & I 1 andI 02 & I 2 . Since I 1 \ I 2 ;, I 01 \ I 02 ; and, therefore, theroutes are disjoint.

    tuThe source node can use any of its routing table entries

    as the first hop in a route; therefore, we can state thenumber of disjoint routes that can be created from anysource node by counting routing table entries.Lemma 5. In a full distributed hash table that employs tree-based

    routing, there are at most d disjoint routes from any sourcenode, whered B 1log B N .

    Proof. EmployingLemma 4, we can createa disjoint route foreach of the d routing table entries by routing to adestination in the segment covered by each entry. Wecannot create d 1 or more disjoint routes because two ormore routes would share the first hop and overlap. tuThe proof for Lemma 5 alludes to choosing multiple

    destinations to create disjoint routes. This lends itself naturally to a replica placement. In the following section,we propose a replica placement that creates disjoint routes,which we call M AXDISJOINT .

    4 THE MAXDISJOINT REPLICA P LACEMENTUsing Pastry as an example in the following sections, wewill demonstrate how the properties of tree-based routingcan be used to construct a replica placement that creates

    422 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

    Fig. 1. Pastry routing table structure for the hypothetical node 121(N 64; B 4). The space is partitioned into B 1logB N 9segments. The highlighted region marks the segment to which node121 belongs.

    1. Some tree-based routing DHT implementations have to provide anadditional mechanism to ensure routing convergence when the DHT is notfull. Pastry, for example, maintains the neighborhood and leaf sets for thispurpose.

    2. Any node can cover the entire id space; therefore, we can state that n 1covers I .

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    5/15

    disjoint routes. We begin with an example placement toprovide the reader with some intuition and then movetoward a more formal definition. After defining theplacement, which we call M AXDISJOINT, we will evaluatethe necessary replication degree to create a desired numberof disjoint routes. Then, we will introduce the notion of arun and provide an expression for the maximum tolerablerun length for a given replication degree. Next, we willdiscuss why M AXDISJOINT is a more adaptive and flexiblesolution than equally spaced replication. Finally, weoutline the basic elements of an implementation of theMAXDISJOINT placement.

    4.1 Intuition behind MaxDisjoint PlacementTo create route diversity in Pastry via replica placement, itis necessary to place replicas such that a given replica setwill use a diverse set of routing table entries for everypossible source node. We use an example to provide thenecessary intuition. Consider a Pastry ring with id space of size 64 and prefix matching in base-4 digits. We show thePastry routing table structure graphically for a hypothetical

    node 121 in Fig. 1.Suppose we wish to replicate an object with id 101 in thisPastry ring. Node 121 routes to this object through therouting table entry marked 10x in Fig. 1. Suppose wereplicate the object with the id 111 to target the routing tableentry 11x in the example. This approac h creates anadditional disjoint route for any lookups f or object 101originating at node 121. One route is forw arded via theentry 10x and the other is forwarded via 1 1x. However,consider another source node 221. This node routes to theobject 101 and 111 through the same entry marked 1xxand, therefore, does not gain an additional di sjoint route.

    To move toward a more effective approac h, consider all

    the replicas of object 101 that would create an additionaldisjoint route for node 121. These are: 001, 111 , 120, 122, 123,131, 201, and 301. Note that there are total of nine possibledisjoint routes 3 (including the route to the object 101), whichis the number of routing table entries for node 121. Of thesereplicas, there are only three that can create an additionaldisjoint route for every possible source node: 001, 201, and301. These replicas create disjoint routes by targeting entriesin the first row of the routing table. Note that targeting anentry in the first row of the routing table requires a singlereplica whose id differs from that of the master object in thefirst digit. To target entries deeper in the routing table, alarger number of replicas are required.

    Suppose we wish to create five disjoint routes for allpossible source nodes. Four routes can be created for everypossible source node using the three replicas we havealready discussed (001, 201, and 301) in addition to theobject 101. To create the fifth route, we must target an entrydeeper in the routing table. In the case of node 121, we maychoose the replica 111. As alluded to before, this replicaonly creates a disjoint route for those source nodes whoseids start with the prefix 1 because these are the onlynodes with an entry for 11x. Since there are four possiblevalues for the prefix ( B 4), four replicas are required to

    target this routing table entry: 011, 111, 211, and 311. One of these four replicas will create an additional route for everypossible source node depending on its prefix. The remain-ing three will be routed through previously used routingtable entries overlappin g a previous route. This is showngraphically in Fig. 2. Fi ve disjoint routes are created fornode 121, one each for the replicas R001 (or R011), R101,R111, R201 (or R211), and R301 (or R311). In a similarfashion, we can create a sixth disjoint route using thereplicas 021, 121, 221, and 321; and a seventh using 031, 131,231, and 331.

    This pattern continu es until the entire id space isexhausted. Note that in Pastry each node partitions theid space using prefixes a nd, therefore, we place replicas byvarying their prefixes. H owever, in general, we are simplyspacing replicas such tha t replicas exist in different parts of each nodes partitioning of the id space. In the next section,we provide an algorithm that generates these replica ids.

    4.2 Definition of MaxDisjoint PlacementMAXDISJOINT assigns each replica an identifier, which isused to determine its placement. The placement algorithmtakes as input N , the identifier space size of the DHT; B , the branching factor; and d , the desired number of disjointroutes. We will prove that M AXDISJOINT creates the desirednumber of disjoint routes in a later section.Algorithm 1 (M AXD ISJOINT Replica Placement). To create

    d disjoint routes, replicas are placed inm 1 rounds, wherem bd 1B 1c. Each round consists of B 1 steps except forthe final round, which consists of n steps, wheren d 1mod B 1. In the ith round, B i1 replicasare placed at equally spaced locations over the entireidentifier space at each step. In stepj of round i, thereplica locations are given by:

    R i;j fki;j ; ki;j s i ; ki;j 2s i ; . . .. . . ki;j B i1 1s ig mod N ;

    where ki;j k j N B i .

    Looking at the example in Fig. 2, consider the placementof an object 101 in a DHT with B 4 and N 64. Supposewe want to create d 5 disjoint routes. Then, we perform

    HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 423

    Fig. 2. MAXDISJOINT placement for object 101 to create five disjointroutes from query node 121 ( N 64 ; B 4).

    3. There is actually a tenth zero-hop path that can be created byplacing a replica at 121.

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    6/15

    m 1 2 rounds of the replica placement algorithm andn 1 step in the final round. The replica locations andthe corresponding rounds and steps of the algorithm aregiven below:

    Round 1 ; Step 1 : 201Round 1 ; Step 2 : 301Round 1 ; Step 3 : 001

    Round 2 ; Step 1 : 111 ; 211 ; 311 ; 011As described in the replica placement algorithm, in each

    round replicas are placed starting at the master key andworking in the direction of increasing identifiers. Thealgorithm is presented as such for its simplicity. However,the steps within each round can be performed in any order.Each step is functionally equivalent to the others in itsround. Therefore, a real implementation may reorder thesteps in each round to distribute the replicas more uniformlyacross the identifier space. This will help to provide load balance and tolerate runs of contiguous failed nodes. Forinstance, in the example above, to create two disjoint routes,we need only the master object (101) and one of the replicas

    created in round 1 (001, 201, or 301). Any of these replicaswould create the second disjoint route, but ch oosing replica301 creates a more uniformly distributed repl ica set.

    4.3 Evaluation of Disjoint RoutesThe desired number of disjoint routes d is one of the tunableinputs to the M AXDISJOINT algorithm. Contro lling the levelof fault tolerance is an important design p arameter and,therefore, d is a very useful input. In this se ction, we willformally prove that the algorithm indeed cre ates d disjointroutes in support of the intuition provid ed in earliersections.

    In our analysis, we assume that routing is performed inan identifier space of size N with branching f actor B . All of our analytical results are proved within the c ontext of a fullDHT, but we will show, through experimentation, thatthese properties hold even in sparsely populated DHTs.

    Our principal goal is to prove the following theorem:Theorem 1. The M AX DISJOINT Algorithm produces d

    B 1logB N disjoint routes from any query node to akey k in a full tree-based routing DHT.

    Proof. Every set R i;j is a unique set of B i1 replicas equallyspaced over the entire id space, which implies aninterreplica spacing of N B i1 . Consider a source node uand one of its routing table parts P u; u N B i1. Foreach R i;j , there exists one replica r k 2R i;j such thatr

    k 2P

    . Furthermore, the replicas fr

    kgwill be equallyspaced within P , separated by an interreplica spacing of N B i . Regardless of how u chooses to partition P for itsrouting table, there exists a replica in each part of P .Therefore, there is a unique routing table entry for eachreplica r k

    4 and a disjoint route is created in every step of the algorithm. Using the definitions of m and n in thealgorithm, it is easy to show that d 1 steps areperformed. The d 1 disjoint routes created in thealgorithm are combined with the route to k to create atotal of d disjoint routes. tu

    As a corollary, we give the necessary replication degreeto create d disjoint routes.Corollary 1. To produced B 1log B N disjoint routes from

    any query node to a keyk in a full tree-based routing DHT with branching factor B > 1, the keyk must be replicated atn 1B m locations determined by the M AX DISJOINT Algorithm, wherem bd 1B 1cand n d 1mod B 1.

    Proof. Since we have proved that d disjoint routes areindeed created by the algorithm, we need to show thatperforming the algorithm with input d places a copy of kat n 1B m locations in the id space. The key k accountsfor one location and the remaining locations aredetermined by the replica placement. Since, B i1 replicasare placed at each step in round i, the total number of replicas is given by:

    key replicas in

    first m rounds ! replicas inlast n steps ! 1

    Xm

    i1

    B 1B i1 nB m

    n 1B m :

    utWe give our replica p lacement the name M AXDISJOINT

    because Corollary 1 pr escribes the minimum replicationdegree to create the d esired number of disjoint routesusing this placement. We state this formally in thefollowing theorem:Theorem 2. To produce d disjoint routes from any query node to

    a key k in a tree-based routing DHT with B > 1, the key kmust be replicated at no fewer than n 1B m locationsdetermined by the M AX DISJOINT Algorithm, where m

    bd 1B 1cand n d 1mod B 1.Proof. Assume d disjoint routes can be created with n 1

    B m 1 locations determined by the M AXDISJOINTAlgorithm, where m bd 1B 1c an d n d 1modB 1. In other words, d disjoint routes are createdwith one fewer replica than prescribed by Theorem 1. Letr be the missing replica. We will show that there exists aquery node q for which d disjoint routes are not created.Suppose r 2R i;j , then let q be a node in r; r j N B i. Thereplica r is the only replica in R i;j that can create adisjoint route for the query node q . Therefore, step j inround i does not create a disjoint route and we cannotform d disjoint routes with one fewer replica. tuIt is worth noting that M AXDISJOINT provides these

    properties without modifying the underlying routingmechanism. M AXDISJOINT naturally creates disjoint routesusing the properties of the tree-based routing scheme.

    4.4 Chord as a Tree-Based Routing DHTTo provide consistency with previous work [16], wereconsider Chord as a tree-based routing DHT. It isstraightforward to show that Chord finger tables areconstructed like tree-based routing tables with B 2. There-fore, we can apply Theorem 1 to give the following corollary:Corollary 2. To produce d log B N disjoint routes from any

    query node to a keyk in a full tree-based routing DHT with

    424 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

    4. It is possible that one of the replicas is placed at u. Even though norouting is performed to fetch this replica, we count it as a route of zero hops.

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    7/15

    branching factor B 2, the keyk must be replicated at2d 1

    equally spaced locations in the ring.Proof. When B 2, m d 1 and n 0. Therefore,

    d disjoint routes are created by replicating k at 2d 1locations in the ring. Furthermore, round i will place2i1 replicas equally spaced over the id space.Aggregating the replicas placed in each round willresult in 2d 1 replicas equally spaced over the id spacewith interreplica spacing N 2d 1 . tuNote that the claim in Corollary 2 is consistent with the

    findings in [16].

    4.5 Toleration of RunsWe define a run of length l starting at peer m to be thecontiguous set of peers with identifiers in the intervalm; m l. As indicated in [33], an adversary can createimbalance in the distribution of peers in the DHT byappropriately selecting identifiers. In the worst case, anadversary can take control of a contiguous sequence of identifiers or, using our terminology, a run of peers.

    We claim that M AXDISJOINT replication can tolerateadversarial runs of bounded length in tree- based routingDHTs. Before proving the tolerable length of a run, weprovide some intuition of how a run may be u sed to disruptrouting in the DHT. Consider a query node q . An adversarycan reduce the number of replicas reachable f rom q by 1B bycontrolling the peer with identifier q N B , where N is theidentifier space size and prefix matching is performed in base B . This is because all of the replicas i n the intervalq N B ; q 2

    N B are routed through the node q

    nB (

    1B of all

    replicas are in this interval). If the adversary controls a runof nodes ending at q B 1B

    N B , he can control a l arger number

    of replicas.Theorem 3. A full tree-based routing DHT with an identifier

    space of sizeN , and r ! B replicas placed using M AX DIS- JOINT placement can tolerate any adversarial run of length1 N B 1B

    1B m , where m blogB rc.

    Proof (By Induction). Since we are using M AXDISJOINTplacement, it is straightforward to show that m

    blog B rcbd 1B 1c. This implies that the maximum lengthrun tolerable by the DHT changes with each round of theMAXDISJOINT algorithm.

    Consider a peer q . For B r < B 2 , m 1 and thereexists a replica k in the interval q; q N B . If we assumethat the adversary has control of the peer q B 1N B (which is the worst case), then we must ensure that k isnot in the run. Thus, the maximum length run isq N B ; q B 1N B , which has length B 1N B N B 1 or 1 N B 1B

    1B .

    Assume that the maximum length run tolerable withr replicas is 1 N B 1B

    1B m , where m blogB rc. Con-sider a query node q . If we assume the adversary takes

    control of the peer q B 1N B , then he does not controlany peers in the interval q; q N B m . Furthermore, theremust be at least one replica in this interval. If anotherround of the M AXDISJOINT algorithm is performed, thenthere will be B replicas in this interval separated by aspacing of N B m 1 . Thus, the length of the tolerable runincreases by N B m 1 to 1 N

    B 1B

    1B m 1. tu

    In a Pastry DHT with a typical value of B 16 and16 replicas, an adversary may control a run of more than85 percent of the identifier space and there will be a route toat least one replica for every possible query. As thereplication degree approaches the identifier space size, themaximum length run tolerable by the DHT approachesB 1

    B N .4.6 Adaptivity and F lexibility of MaxDisjointThere is a noted similari ty between the M AXDISJOINT andequally spaced placeme nts. In fact, when n 0, MAXDIS- JOINT produces equally spaced replica locations. One mayargue that equally space d replica placement is as effective asMAXDISJOINT in creatin g disjoint routes. However, replicaplacements have other d esirable properties other than theirability to create route d iversity; equally spaced placementfails to deliver some of t hese benefits when B > 2.

    Twoproperties in part icular areadaptivity and flexibility.Adaptivity is the abilit y to easily change the replicationdegree of an object witho ut incurring a large overhead. If thereplication degree of an object is changed, we would like tominimize the number of messages exchanged and objectsshifted. Flexibility is the ability to easily vary the replicationdegree of different objects without incurring a large over-head. Certainly, some objects maybe more popularor criticalthan others and require a higher degree of replication. Theplacement should replicate objects to varying degreeswithout using excessive time or state at the time of insertionand lookup.

    MAXDISJOINT provides adaptivity more effectively thanan equally spaced placement. A change in the replicationdegree must be handled carefully to maintain equalspacing. Consider an increase in the replication degree to

    add an additional disjoint route. With M AXDISJOINT, theadditional replicas are placed at the locations determined by performing another step in the placement algorithmleaving the existing replicas in their current locations.

    With equally spaced replicas, additional replicas can beintroduced in two different ways. The first option is tocompute the equally spaced replica locations for the newreplication degree and shift existing replicas, if necessary. Insome cases, the existing replica locations will not be a subsetof the new replica locations. This implies that existingreplicas will have to be shifted, which has a non-negligiblecost. The second option is to double the current replicationdegree such that no existing replicas must be shifted. Thesetwo options are depicted graphically in Fig. 3.

    HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 425

    Fig. 3. Two methods for increasing the replication degree when usingequally spaced replica placement.

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    8/15

    Doubling the replication degree avoids the cost of shifting existing replicas, but may come with the added burden of storing an excessive number of replicas. Thenumber of replicas prescribed by Theorem 1 is sufficient.For example, in Fig. 3, only four additiona l replicas areneeded to create an additional disjoint ro ute; however,doubling the replication degree introduce s eight newreplicas. The excess storage burden created w hen doublingthe replication degree is shown quantitatively for B 16 inFig. 4. MAXDISJOINT is able to create a desir ed number of disjoint routes more effectively than equally spaced place-ment when there is a change in the replicatio n degree.

    A sound replica placement also provid es flexibility;that is, the replication degree of one data item is notdependent on the replication degree of an y other dataitem. Fortunately, flexibility can be provid ed easily byMAXDISJOINT placement.

    The problem of flexibility arises at the ti me of lookup.Without knowledge of the replication degree of the target,the replica locations cannot be determined. As a solution,rather than fixing a system-wide replication degree for alldata items or storing the replication degrees for all objects,we fix a maximum replication degree r max for all objects. Fora data item with replication degree r r max , the r replicaswill be located at a subset of the r max replica locations. As aresult, for some lookups, we may query a replica that doesnot exist, but we trade off these extra messages for thereduction in node state.

    4.7 Implementation

    To uniquely identify each replica, we use a key identifier pairk; v, where k is the key identifier and v is virtual keyidentifier. For each replica, v gives the location of the masterkey. By definition, the master key k is denoted by the pairk; k. We require an ordered pair because two data itemsmay have replicas that reside on the same peer.

    When a key k is inserted into the DHT with replicationdegree d , we first compute the key identifier pairs for eachreplica: k; k; k1; k; k2; k; . . . kd 1; k. Once the key iden-tifier pair for each replica is computed, we use the DHT keyinsertion mechanism to insert the replicas. That is, weperform a lookup for each key identifier and store thereplica in their respective locations.

    Next, the DHT lookup primitive must be modified toaccommodate the new replication scheme. When a peer isqueried for a key, the query node must compute thelocations of all replicas. The key identifier is used to routeto the replicas. Once a replicas home node is found, thekey identifier pairs are compared to return the appro-priate replica.

    It is worth noting that no additional routing table entriesare required to route to the replicas. In addition, if the querynode dispatches the lookups for the entire replica setsimultaneously, there may be an improvement in perfor-mance because the query node can return the first correctresponse received (which may have returned along a routeshorter than the route to the master key). However, if theadded load of the extra lookups puts strain on the system,theperformance may improve only slightly or even degrade.We evaluate this hypothesis through experimentation.

    Finally, when peers join or leave the DHT, the DHT joinand leave mechanisms can be used by simply ignoring thevirtual peer identifiers in each key identifier pair. Note thatDHT replica placement schemes that place replicas in theneighborhood of the ma ster key require modification to thenode join and leave me chanisms. To maintain the replica-tion degree, replicas wil l need to be shifted for every joinor leave. Ghodsi et al. [ 12] discuss the effect of churn onsymmetric (equally spac ed, M AXDISJOINT) replication andshow that only O1 messages are needed to maintain thereplication degree for every join or leave compared to

    rmessages for a succ essor-list (neighbor set) placement,where r is the replicatio n degree.

    5 EXPERIMENTSTo confirm that our an alytical results hold for sparsely

    populated DHTs or DH Ts with clustered id spaces, weconducted a series of exp eriments through simulation. First,to measure the impact of replica placement on routingrobustness, we consider the number of disjoint routescreated for several replica placements. Furthermore, wefind the probability of lookup success when nodes arecompromised at random or in a run of several nodes.

    Second, we consider a heuristic used for creating routediversity in [3] that we call neighbor set routing. We measureits ability to create route diversity and the impact on theprobability of lookup success.

    Finally, having shown that replica placement canimprove routing robustness, we consider the impact of parallel queries on response time.

    1. Simulation Environment : All of our experiments wereperformed using a Java-based simulator we devel-oped. We are able to model Chord and Pastryrouting, uniform and clustered node distributions,and two adversarial models. Nodes may be com-promised at random with some failure probability orin a run of several nodes. The simulator is extensibleto model other DHT implementations, node dis-tributions, and adversarial models.

    Each data point in our results is representative of over 100,000 lookups performed in 10 differentrandom node distributions. We simulate a lookup

    426 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

    Fig. 4. Replication degree for increasing numbers of disjoint routes(B 16).

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    9/15

    by randomly selecting an uncompromised querynode and a target key. In reality, if the query node iscompromised, it can affect the outcome of the lookup.However, if we assume that data items are self-verifying, a compromised query node can only causethe client to time out and select another query node.We deem a lookup successful if there exists a routeconsisting of only uncompromised nodes from thequery node to any replica of the target key. If allroutes from the query node to the replica set contain acompromised node, then the lookup is deemedto fail.

    For most experiments, it is sufficie nt to computeroutes in the network using the appropriaterouting protocol. However, to meas ure responsetime, it was necessary to modify our simulator to be event driven. The remaining sim ulation para-meters are summarized in Table 1.

    2. Replica Placements Considered: In our ex periments, weconsider four replica placements: M AXDISJOINT;neighbor set, where replicas are plac ed at distinctnodes in the neighborhood of the roo t (e.g., Chordsuccessor list, Pastry leaf set); random, where replicalocations are uniformly distributed; and spaced,where replica identifiers are separated by a uniformspacing s. It is worth noting that two re plicas may beplaced at the same node with spaced replication.

    In the case of neighbor set placement, someimplementations may attempt to reduce load byquerying the entire replica set with a single lookupmessage. This naturally creates route overlap; for afair assessment, we dispatch a separate lookup foreach replica in the replica set.

    5.1 Measurement of Disjoint RoutesFirst, to verify the correctness of our analysis, we measurethe average number of disjoint routes created using theconsidered replica placements. These results are depictedin Fig. 5. For the parameters tested, M AXDISJOINTplacement outperforms the other placements in creatingdisjoint routes.

    The neighbor set placement does not create a significantnumber of disjoint routes as expected. Routes toward keysthat are close to each other in the identifier space are likelyto converge. Since the neighbor set placement clustersreplicas, an adversary can eliminate the entire replica set if he can compromise a node common to all routes destinedfor that neighborhood. By increasing the route diversity, weeliminate these single points of vulnerability.

    The performance of the spaced placement scheme isdependent on the spacing chosen. If the spacing is small,then the spaced placement is very similar to the neighbor

    set placement. In the extreme case, if the replica spacing ismuch less than the average internode spacing, two or morereplicas may be placed at the same node. We observe thisphenomenon for the s paced placement with s 105 inFig. 5. As we increase t he spacing, there is a tendency toincrease the number of disjoint routes. However as wecontinue to increase t he replication, we will reach asaturation point where replica locations wrap around theidentifier space and no additional disjoint routes will becreated. We observe thi s phenomenon when the spacings N=6. This implies th at the spacing should be a functionof the replication degre e, which is fundamental to howMAXDISJOINT creates di sjoint routes.

    The random placeme nt creates nearly as many disjointrouteson average becaus e replicas are uniformly distributed.However, there is a signi ficant difference between M AXDIS- JOINT and random place ment in the worst case. We present

    an argument in supporto f this claim at the end of this section.To ensure that this n umber of disjoint routes could becreated in more sparsely populated id spaces, we varied thenumber of nodes in the DHT starting from 32 and measuredthe number of disjoint routes for various replication degrees.These data are depicted in Fig. 6. The actual number of disjoint routes converges quickly to the theoretical value,which implies that M AXDISJOINT placement can be usedeffectively in very sparsely populated networks. In these

    HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 427

    TABLE 1Pastry Simulation Parameters

    Fig. 5. Number of disjoint routes with increasing replication degree inPastry ( N 228 ; n 8;192 ; B 16).

    Fig. 6. Number of disjoint routes with increasing number of nodes inPastry ( N 228 ; B 16 ; d 2 f2; 3; 4; 5; 6g).

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    10/15

    results, the theoretical number of disjoint routes is achievedfor loads greater than 1,000 nodes, which is less than athousandth of a percent of the identifier space.

    5.2 Resilience to Node ClusteringTo model the population distribution that m ay result fromthe collusion of several malicious nodes, a series of experiments were run on clustered DHTs. To model aclustered distribution, four node ids we re randomlyselected as cluster means such that the clusters arenonoverlapping and unpopulated gaps ex ist in the idspace. The cluster density was tuned using the standarddeviation of the Gaussian distribution centere d around eachcluster mean. The number of disjoint routes c reated for 215 and 216 are shown in Figs. 7 and 8, re spectively.

    Clustering can marginally reduce the num ber of disjointroutes created for small replication degrees, b ut the impact

    is more dramatic when creating a larger num ber of disjointroutes. The impact of clustering is intensifie d with tighterclusters (i.e., decreasing ) because replicas are not locatedat the positions that M AXDISJOINT prescribes. Although thereplica ids are properly assigned, the replicas themselvesare confined to the clusters. One or more of the replicas maylay in the unpopulated gaps between clusters. Thesereplicas get pushed to next cluster in the id space,possibly eliminating a disjoint route. This is more likely to

    happen when creating a large number of disjoint routes(which requires more replicas).

    We computed the percentage decrease in disjoint routes

    that results from cluster ing. This analysis produced somevery interesting results that are depicted in Fig. 9. Oneinteresting conclusion is that not all replica placements arenegatively affected by n ode clustering. In particular, whenreplicas are placed clo se together in the id space, e.g.,neighbor set or spaced (f or small spacings) placement, therecan actually be an increa se in the number of disjoint routescreated. This is because a long string of closely packedreplicas may span a ga p between clusters, which pushessome of the replicas t o another cluster. The clusteringactually helps distribut e the replicas better resulting inmore disjoint routes. Ne vertheless, the 4-6 percent increasein the number of disjoi nt routes is insubstantial becausethese placements fail t o create a sufficient number of

    disjoint routes with unif ormly distributed nodes.To the contrary, the MAXDISJOINT and random place-ments are affected negatively by clustering. Placements thatdistribute replicas in the id space may place a replica in theunpopulated gaps between clusters. These replicas getpushed to a cluster and a disjoint route may be lost. Thisphenomenon takes effect for M AXDISJOINT when r 8. Atthis point, the interreplica spacing is 225 . If we use twostandard deviations to capture 95 percent of each cluster, wecan estimate the intercluster gaps to about 226 , which is twicethe interreplica spacing. That implies that about half of allreplicas are located in the gaps between clusters. Note thatthe random placement suffers from this problem over theentire range of replication degrees because it is not

    guaranteed to distribute replicas across the entire id space.Furthermore, as we increase the replication degree, morereplicas will be located in the gaps and we lose the benefit of increased replication degree. Nevertheless, the three to fivepercent decrease in the number of disjoint routes is relativelyinsignificant because M AXDISJOINT creates so many moredisjoint routes than the other placements in a uniformlypopulated id space.

    5.3 Impact of Replica Placement on RoutingRobustness

    To demonstrate that the number of disjoint routes has asignificant impact on the robustness of the DHT, wemeasure the probability of lookup success with a random

    428 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

    Fig. 7. Disjoint routes for tightly clustered nodes in Pastry(N 228 ; n 8;192 ; 215 ).

    Fig. 8. Disjoint routes for loosely clustered nodes in Pastry(N 228 ; n 8;192 ; 216 ).

    Fig. 9. Percent decrease in disjoint routes from uniformlyd is tr ibuted nodes to a c lust ered d is tr ibut ion in Pas try(N 228 ; n 8;192 ; C 4; 1016 ).

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    11/15

    fraction of faulty nodes. A faulty node may be a failed orcompromised node. The probability of routing success isshown in Fig. 10.

    These results indicate a correlation betwee n the numberof disjoint routes and the probability of lo okup success.The MAXDISJOINT and random placement s most effec-tively create disjoint routes and, thus, ha ve the mostpositive impact on the probability of rou ting success.Furthermore, we can conclude that using MAXDISJOINTplacement instead of neighbor set placemen t can drama-tically improve the probability of routing su ccess. With aquarter of nodes compromised at random, greater than97 percent of all lookups can be resolved suc cessfully withMAXDISJOINT placement compared to onl y 60 percentwith neighbor set placement.

    With the network configuration in Fig. 10, the M AXDIS- JOINT placement creates eight disjoint routes. Therefore, anadversary could prevent the correct resoluti on of a givenquery by compromising only eight nodes. H owever, withfar more nodes than that compromised at random, nearlyall queries are resolved successfully. This is a strongindication that replica placement plays a critical role inproviding robustness in DHTs.

    Our analysis indicates that M AXDISJOINT should be ableto toleraterunsof compromised nodesbetterthan placementsthat cluster replicas closely together. Experimental resultsthat confirm this hypothesis are depicted in Fig. 11. With16 replicas and 85 percent of the identifier spaced compro-misedinarun,greaterthan96percentoflookupsareresolvedsuccessfullywith M AXDISJOINT placementcompared toonly13 percent with neighbor set placement. Furthermore, these

    results demonstrate an exploit of random placement. Withmoderately long runs, M AXDISJOINT is able to successfullyresolve a higher fraction of queries than the randomplacement. For example, with 85 percent of the identifierspaced compromised in a run, only 66 percent of lookups areresolved successfully with a random placement. This is because the random placement may cluster replicas for asignificant fraction of keys. We investigate this further in thefollowing section.

    5.4 MaxDisjoint versus Random PlacementIn the experimental data presented thus far, randomplacement seems to have performed on par with M AXDIS- JOINT on average. We argue, however, that a truly random

    placement is expensive to implement in practice and theaverage performance of a random placement does nottrickle down to the worst case.

    The worst case attack against a replica placement is totarget the replica location s themselves. One may argue that aplacement with random locations would make it moredifficult for an adversary to determine and target the replicaset than a determinist ic approach like M AXDISJOINT.However, a truly rand om placement also complicatesmatters for the query n ode. The query node must be ableto compute the replica lo cations for any object at the time of lookup. To avoid keepin g a considerable amount of state ateach node, the only pract ical proposal of which we are awareis to use multiple hash functions to generate randomreplica locations. Howev er, this implementation is not trulyrandom and, with know ledge of the hash functions, it is asvulnerable as M AXDISJOINT to attacks that target all replicalocations for a particul ar object. This makes randomreplication simply a different deterministic placement thatis, in a sense,an approximation of equallyspaced replication.Yet, as we show next, theperformance of random replicationfalls considerably short of M AXDISJOINT.

    The performance results we have shown thus far depictthe average number of disjoint routes created. To considerthe worst case performance, we measured the minimumnumber of disjoint routes created for M AXDISJOINT andrandom placement. These results are depicted in Fig. 12.

    The results in Fig. 12 confirm our hypothesis of the worstcase performance of a random placement. For a few objects,the placement may create as few as half of the desired

    number of disjoint routes. To establish that the worst casewas not an isolated case, we measured the number of disjoint routes created over all lookups. The cumulativedistribution of the number of disjoint routes with eightreplicas is shown in Fig. 13.

    For the case depicted in Fig. 13, M AXDISJOINT createdthe expected eight disjoint routes for every single lookup.Random placement created six or fewer disjoint routes for45 percent of lookups and, in some cases, it created as fewas four, or only half of the number produced byMAXDISJOINT. We observed the same behavior in othercases as well. Therefore, if delivering a consistent level of fault tolerance across all lookups is a design constraint,random placement is not a reasonable solution.

    HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 429

    Fig. 10. Probability of routing success with uniformly compromisednodes in Pastry ( N 228 ; n 8;192 ; B 16 ; r 8).

    Fig. 11. Probability of routing success with runs of compromised nodesin Pastry ( N 228 ; n 8;192 ; B 16 ; r 16).

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/
  • 8/6/2019 Replica Placement for Route Diversity in Trees-based Routing Distributed Hash Tables

    12/15

    5.5 Replica Placement and Neighbor Set RoutingWe believe replica placement is an efficient way of creatingdisjoint routes because it does not requi re significantmodification to the underlying DHT routing s cheme. Otherworks have considered reusing the exis ting routinginformation to create route diversity [3], [ 24]. Althoughthese approaches do not create provably disjo int routes, we believe there is value in introducing some ad ditional formof route diversity. Furthermore, we belie ve that thesetechniques could be combined with our repl ica placementto provide additional benefit. To evaluate t his claim, weconsider the route diversity technique introdu ced by Castroet al. [3], which we call neighbor set routing.

    Castro et al. use neighbor set routing to find diverseroutes toward the neighborhood of a key. To create diverseroutes, messages are routed via the neighbors of the sourcenode. This is depicted graphically in Fig. 14 . Castro et al.claim that this technique is sufficient in the case whenreplicas are distributed uniformly over the identifier space,as in CAN and Tapestry. We consider the ability of neighbor set routing to create diverse routes to a replicato enhance the routing robustness of M AXDISJOINT.

    To evaluate the relative impact of replica placement andneighbor set routing, we consider four scenarios:

    1. neither replication nor neighbor set routing,2. only neighbor set routing through eight neighbors,3. only M AXDISJOINT placement with eight replicas,

    and4. both neighbor set routing and M AXDISJOINT

    placement.These results are depicted in Fig. 15.

    Both MAXDISJOINT placement and neighbor set routingcan have a positive im pact on the probability of lookupsuccess. However, the trend is stronger with replicaplacement. With a qu arter of nodes compromised atrandom, over 97 percen t of lookups are resolved success-fully with M AXDISJOINT placement compared to 63 percentwith neighbor set rout ing alone. At best, neighbor setrouting can create indep endent routes, since all paths willconverge at the destin ation. If the destination node iscompromised, no amoun t of route diversity can increase theprobability of lookup su ccess.

    Nonetheless, the adde d route diversity that neighbor setrouting provides can benefit M AXDISJOINT placement,especially with a large fraction of compromised nodes.With 50 percent of node s compromised, the probability of lookup success of using MAXDISJOINT placement improvesfrom 52 to 84 percent with neighbor set routing.

    5.6 Parallel Queries and Response TimeFinally, since replica placement seems to be a reasonablemethod for improving routing robustness, it is natural to

    430 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011

    Fig. 13. Cumulative distribution of disjoint routes created for M AXDIS-JOINT and random placement in Pastry ( N 220 ; n 8;192 ; r 8).

    Fig. 14. Graphical depiction of neighbor set routing.

    Fig. 12. Worst-case performance of M AXDISJOINT and random replicaplacement in Pastry. The minimum number of disjoint routes createdover all lookups is shown ( N 220 ; n 8;192 ).

    Fig. 15. Probability of routing success with neither replication norneighbor set routing (None), neighbor set routing only (NBR),MAXDISJOINT placement only (MD), and both neighbor set routingand M AXDISJOINT placement together (NBR+MD) ( N 228 , n 8;192 ,B 16, r 8).

    h t t p : / / w

    w w . i e e e

    x p l o r

    e p r o j

    e c t s . b l o

    g s p o

    t . c o m

    http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/http://www.ieeexploreprojects.blogspot.com/h