13
A new and effective hierarchical overlay structure for Peer-to-Peer networks Ming Xu a,b , Shuigeng Zhou a,b,, Jihong Guan c a School of Computer Science, Fudan University, Shanghai 200433, China b Shanghai Key Lab of Intelligent Information Processing, Shanghai 200433, China c Department of Computer Science & Technology, Tongji University, Shanghai 201804, China article info Article history: Received 9 February 2010 Received in revised form 24 July 2010 Accepted 15 October 2010 Available online 21 October 2010 Keywords: Peer-to-Peer networks Hierarchical architecture Query processing abstract The tremendous growth of public interest in Peer-to-Peer (P2P) networks in recent years has initiated a lot of research work on how to design efficient overlay structures for P2P systems. Scalable overlay net- works such as Chord, CAN, Pastry, and Tapestry provide no control over where data is stored and the loca- tion of the peers and resources is determined by the hash values of their identifiers and keys respectively. As a result, these overlays cannot support range queries and other proximity-aware complex queries directly. In this paper, we present a hierarchical P2P overlay network called SkipCluster, which is capable of sup- porting both exact-match and multi-dimensional range queries efficiently without consumption of extra memory space. SkipCluster is derived from skip graphs and SkipNet, but it has a two-tier hierarchical architecture.In both tiers, peers are connected in sequence according to the order of their peer IDs, and related resources are stored near each other without hashing of their resource keys. We design a novel data structure called Triple Linked List (TLL) to store each super-peer’s pointers in the higher tier, which can be used to find the longest prefix and speed up query routing of inter-cluster. In the lower tier, each intra-cluster peer’s routing table contains pointers with exponentially incremental distance. Experimen- tal results show that SkipCluster can speed up both exact-match and range queries in different network sizes. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction In the past decade, Peer-to-Peer (P2P) networks have rapidly evolved and have become an important part of the Internet. P2P systems are application layer overlay networks that enable users to share resources in a distributed manner. There are mainly two classes of overlays for P2P networks: structured and unstructured [1]. Unstructured P2P networks are composed of peers joining the networks with some loose rules without any prior knowledge of the topologies. There is no correlation between a peer and the ob- jects managed by it in unstructured P2P networks. If a peer is look- ing for rare objects shared by only a few peers, the queries may not always be successful [2–5]. Structured P2P networks have been developed to improve the performance of data discovery in such a way that each object/node is identified by a key, and peers are or- ganized into a structured graph while objects are distributed over the nodes according to a certain mapping scheme. Structured overlay networks, such as Chord [6], CAN [7], Pastry [8] and Tapestry [9], have emerged as flexible infrastructures for building large P2P systems based on various distributed hash ta- bles (DHTs), which allow data to be uniformly spread over all the participants. Although they have excellent load balancing proper- ties, these structures can not directly support range queries. DHT’s multi-hop approach has been proposed for lookup optimization in truly vast and very dynamic peer networks with tight latency guar- antee where latency becomes dominated by the underlying net- work, especially the number of hops and processing nodes [10]. Mesh-based overlay has become an increasingly popular approach for P2P streaming due to its potential scalability in content deliv- ery. Mesh-based approach consistently exhibits a superior perfor- mance over the tree-based approach, especially in minimizing delay and bandwidth consumption [11–13]. Hypercube-based topologies have low node degree and small network diameter, which allow them for efficient routing and search [14]. In addition, a hypercubic topology called PeerCube can minimize the impact of performance penalties caused by collusion and churn at the ex- pense of O(logN) latency and O(logN) messages for each lookup, put, join and leave operation [15]. A comparison of resilience and proximity properties of different DHT-based topologies has been thoroughly analyzed in [16]. 0140-3664/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2010.10.005 Corresponding author at: School of Computer Science, Fudan University, Shanghai 200433, China. Tel.: +86 21 55664298; fax: +86 21 55664298. E-mail addresses: [email protected] (M. Xu), [email protected] (S. Zhou), [email protected] (J. Guan). Computer Communications 34 (2011) 862–874 Contents lists available at ScienceDirect Computer Communications journal homepage: www.elsevier.com/locate/comcom

A new and effective hierarchical overlay structure for Peer-to-Peer networks

  • Upload
    ming-xu

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Computer Communications 34 (2011) 862–874

Contents lists available at ScienceDirect

Computer Communications

journal homepage: www.elsevier .com/ locate/comcom

A new and effective hierarchical overlay structure for Peer-to-Peer networks

Ming Xu a,b, Shuigeng Zhou a,b,⇑, Jihong Guan c

a School of Computer Science, Fudan University, Shanghai 200433, Chinab Shanghai Key Lab of Intelligent Information Processing, Shanghai 200433, Chinac Department of Computer Science & Technology, Tongji University, Shanghai 201804, China

a r t i c l e i n f o

Article history:Received 9 February 2010Received in revised form 24 July 2010Accepted 15 October 2010Available online 21 October 2010

Keywords:Peer-to-Peer networksHierarchical architectureQuery processing

0140-3664/$ - see front matter � 2010 Elsevier B.V. Adoi:10.1016/j.comcom.2010.10.005

⇑ Corresponding author at: School of ComputerShanghai 200433, China. Tel.: +86 21 55664298; fax:

E-mail addresses: [email protected] (M. XuZhou), [email protected] (J. Guan).

a b s t r a c t

The tremendous growth of public interest in Peer-to-Peer (P2P) networks in recent years has initiated alot of research work on how to design efficient overlay structures for P2P systems. Scalable overlay net-works such as Chord, CAN, Pastry, and Tapestry provide no control over where data is stored and the loca-tion of the peers and resources is determined by the hash values of their identifiers and keys respectively.As a result, these overlays cannot support range queries and other proximity-aware complex queriesdirectly.

In this paper, we present a hierarchical P2P overlay network called SkipCluster, which is capable of sup-porting both exact-match and multi-dimensional range queries efficiently without consumption of extramemory space. SkipCluster is derived from skip graphs and SkipNet, but it has a two-tier hierarchicalarchitecture.In both tiers, peers are connected in sequence according to the order of their peer IDs, andrelated resources are stored near each other without hashing of their resource keys. We design a noveldata structure called Triple Linked List (TLL) to store each super-peer’s pointers in the higher tier, whichcan be used to find the longest prefix and speed up query routing of inter-cluster. In the lower tier, eachintra-cluster peer’s routing table contains pointers with exponentially incremental distance. Experimen-tal results show that SkipCluster can speed up both exact-match and range queries in different networksizes.

� 2010 Elsevier B.V. All rights reserved.

1. Introduction

In the past decade, Peer-to-Peer (P2P) networks have rapidlyevolved and have become an important part of the Internet. P2Psystems are application layer overlay networks that enable usersto share resources in a distributed manner. There are mainly twoclasses of overlays for P2P networks: structured and unstructured[1].

Unstructured P2P networks are composed of peers joining thenetworks with some loose rules without any prior knowledge ofthe topologies. There is no correlation between a peer and the ob-jects managed by it in unstructured P2P networks. If a peer is look-ing for rare objects shared by only a few peers, the queries may notalways be successful [2–5]. Structured P2P networks have beendeveloped to improve the performance of data discovery in sucha way that each object/node is identified by a key, and peers are or-ganized into a structured graph while objects are distributed overthe nodes according to a certain mapping scheme.

ll rights reserved.

Science, Fudan University,+86 21 55664298.), [email protected] (S.

Structured overlay networks, such as Chord [6], CAN [7], Pastry[8] and Tapestry [9], have emerged as flexible infrastructures forbuilding large P2P systems based on various distributed hash ta-bles (DHTs), which allow data to be uniformly spread over all theparticipants. Although they have excellent load balancing proper-ties, these structures can not directly support range queries. DHT’smulti-hop approach has been proposed for lookup optimization intruly vast and very dynamic peer networks with tight latency guar-antee where latency becomes dominated by the underlying net-work, especially the number of hops and processing nodes [10].Mesh-based overlay has become an increasingly popular approachfor P2P streaming due to its potential scalability in content deliv-ery. Mesh-based approach consistently exhibits a superior perfor-mance over the tree-based approach, especially in minimizingdelay and bandwidth consumption [11–13]. Hypercube-basedtopologies have low node degree and small network diameter,which allow them for efficient routing and search [14]. In addition,a hypercubic topology called PeerCube can minimize the impact ofperformance penalties caused by collusion and churn at the ex-pense of O(logN) latency and O(logN) messages for each lookup,put, join and leave operation [15]. A comparison of resilience andproximity properties of different DHT-based topologies has beenthoroughly analyzed in [16].

Fig. 1. Two SkipNets with similar name IDs but different numeric IDs.

M. Xu et al. / Computer Communications 34 (2011) 862–874 863

Skip graphs [17] and SkipNet [18] have similar distributed datastructure based on skip lists [19], where nodes are connected as acollection of circular lists, constituting a series of overlapping skiplists at different levels. Each node requires only logarithmic statesto store information about its neighbors. The level-0 list consists ofall nodes in sequence. Ordered overlays such as skip graphs andSkipNet have obviously practical advantages over DHTs. First, theycan take advantage of locality properties while processing searchrequests. Second, it is easy to perform range queries on ordereddata.

In this paper we present a novel hierarchical overlay structurecalled SkipCluster for speeding up both exact-match and rangequeries without consumption of extra memory space. SkipClusteris a non-trivial extension of the skip graphs and SkipNet structure,which is built on a two-tier hierarchical architecture. In SkipClus-ter, all peers in the network are assigned to different clusters andeach cluster has a super-peer, which forms a two-tier architecture.The higher tier consists only of super-peers representing differentclusters and the lower tier consists of peers in different clusters.We design a novel data structure called Triple Linked List (simplyTLL) to store each super-peer’s pointers in the higher tier, whichis used to find the longest prefix and speed up query routing fromone cluster to another. In the lower tier, all intra-cluster peers arearranged in sequence according to their peer ids, and adjacentpeers are connected to each other into a close ring. From the viewof an intra-cluster peer, SkipCluster divides the peers in the samecluster into several segments in two directions (clockwise andcounter-clockwise). The lengths of these segments grow exponen-tially. Each intra-cluster peer selects one peer from each segmentin its cluster as neighbor. As a whole, the pointers in the routingtable cover an exponentially increasing number of peers with highpossibility.

SkipCluster preserves the order of peer ids and stored keys bothin the higher tier and the lower tier, which allows it to efficientlysupport range queries of one-dimension data. For dealing withmulti-dimensional data, we use Z-order [20] to map multi-dimen-sional data to one-dimension and store them in sequence in Skip-Cluster’s clusters.

As we show in this paper, SkipCluster exhibits the followingmerits over SkipNet:

� In the higher tier, TLL can be used to find the longest prefix andthus can speed up query routing from one cluster to anotherwithout consuming extra memory space;� In the lower tier, each segment-h pointer traverses nearly 2h

peers if not all peers in a segment leave the network simulta-neously, which enables SkipCluster to travel 2h peers with ahigher possibility than SkipNet’s pointers can do.

The rest of this paper is organized as follows. Section 2 surveysthe related work. Section 3 presents the design of SkipCluster. Per-formance evaluation is described in Section 4. Finally, we concludethe paper in Section 5.

2. Related work

Before introducing our new overlay structure, we present a sur-vey on the related work as follows.

Skip lists, first described by Pugh [19], are probabilistic alterna-tive to balanced trees. The simplicity of skip list algorithms makesthem easier to implement and provides significant constant factorspeed improvement over balanced tree and self-adjusting treealgorithms. Skip lists maintain a list with all the keys at the bottomlevel, and build increasingly sparse lists on upper levels by choos-ing a key on the immediate lower level with probability p. In dou-bly-linked skip lists, each node stores a predecessor pointer and a

successor pointer for each list where it appears. Searching for anode with a particular key takes an average expected time ofO(log1/p N). A possible drawback of skip lists is the selection of p,which incurs a tradeoff between search time and space require-ment. Although a small p can decrease the space requirement, italso increases the search time. Pugh suggested a value of 1/4 forp unless the search time need to be concerned first, in which casep should be 1/2. Modifying p at runtime is an option, but the cost isalso high. In this case, all nodes should reconstruct their pointers atdifferent levels except at the lowest level.

Aspnes and Shah [17] proposed skip graphs as a generalizationof skip lists. Each of the N nodes in a skip graph belongs to a multi-ple-linked list as well as in a skip list. What makes a skip graph dis-tinguished from a skip list is that there may be 2i lists at level i, andevery node participates in one of these lists, until the nodes aresplit into singletons after O(logN) levels on average. Skip graphscan be constructed without knowledge of the total number ofnodes in advance. In contrast, P2P systems based on DHTs requirea priori knowledge about the network size and key space.

SkipNet is very similar to skip graphs that was independentlydeveloped by Harvey et al. [18]. SkipNet organizes nodes into a cir-cular distributed data structure that concurrently supports twoseparate, but related address spaces. In one space, nodes belongto multiple rings where ring members are lexicographically or-dered according to nodes’ name IDs. In the other space, nodesare labeled with uniformly distributed numeric IDs. These numericIDs define which rings a node belongs to in the first space. Nodenames and content identifier strings are mapped directly into thename ID space, while hashes of the node names and content iden-tifiers are mapped into the numeric ID space. The combination ofthe two spaces enables SkipNet to provide efficient message rout-ing as well as support several important locality properties. In a‘‘perfect” SkipNet, each level-h pointer traverses exactly 2h nodes.But in fact, the randomly selected numeric IDs as well as the fre-quently joining and leaving nodes in SkipNet can hardly organizeperfect routing tables. As a result, some queries may be forwardedmore than O(logN) hops with high probability during the queryrouting process. SkipCluster can partly counteract this effect aslong as not all peers in a segment of a cluster leave the network.Fig. 1 shows two SkipNets with same name IDs and different nu-meric IDs. The blue arrowhead lines denotes node A’s pointers.Each binary number besides a node is the corresponding numericID. In a ‘‘perfect” SkipNet as shown in Fig. 1(a), each level-h pointertraverses exactly 2h nodes; but in a ‘‘defective” SkipNet as shown inFig. 1(b), each level-h pointer may traverse only h nodes.

SkipMard is a multi-attribute P2P resource discovery approachthat extends Skip Graph structure to support multi-attribute que-ries [21]. The authors introduced the concepts of ‘‘layer” and‘‘crossing layer nearest neighbor” into the data structure. Theconcept of ‘‘layer” in SkipMard is different from that of ‘‘tier” in

864 M. Xu et al. / Computer Communications 34 (2011) 862–874

SkipCluster. SkipMard has pi (p >= 2) doubly-linked lists at level i,each of which is called a ‘‘layer”. In order to speed up query rout-ing, each node in SkipMard contains additional routing informa-tion, which are called ‘‘crossing layer nearest neighbor” pointerswith different layers at each level. Each node has O(m*l) neighborsfor a SkipMard with totally m layers and l levels. Thus the expectedspace of routing table in SkipMard is larger than skip graphs orSkipCluster. In the higher tier of SkipCluster, it has 2i rings (insteadof linked lists) at level i and there are no pointer that crosses therings. In the lower tier of SkipCluster, each peer has only pointersthat refer to peers in the same cluster and the organization of thesepointers is totally different from that of SkipMard, as described inSection 3.1. The major contribution of SkipMard is that it extendsthe structure of skip graphs to support multi-attribute querieswhile SkipCluster is focused on speeding up query routing of ex-act-match and one-attribute/dimension range queries withoutconsumption of extra memory space. Certainly, SkipCluster canbe easily extended to process multi-attribute/dimension rangequeries by 1) first transforming multi-attribute/dimension datainto one-attribute or dimension data (using space filling curveetc.) and then directly employing SkipCluster; or 2) using a Skip-Cluster for each attribute or dimension. The structure of a m-layerSkipMard is very similar to that of a modified SkipNet with a sparseR-Table of parameter k(m = k). In a 2-layer SkipMard, the‘‘crossinglayer nearest neighbor” method still can not overcome the latentdefect of SkipNet.

Chord# is evolved from Chord but it has a proven logarithmiclookup performance rather than O(logN) just ‘‘with high probabil-ity” in Chord [22]. Moreover, Chord# substitutes a key-order pre-serving function for the hash function in Chord, which enables itto support more complex queries. Chord# uses a recursive fingerplacement algorithm to build its routing tables and it only needsO (1) hops for updating an entry in the routing table instead ofO(logN) in Chord. But the update cost for the join or departure ofa node is also O(logN) in Chord# because there are O(logN) othernodes that point to this node. In addition, the routing table of eachnode must be periodically validated and updated. Chord# is theunderlying structure of Scalaris, a distributed key/value store withsymmetric data replication and transactional ACID properties [23].In order to support multi-dimensional data and multi-attributerange queries, an enhanced version of Chord# called SONAR (Struc-tured Overlay Network with Arbitrary Range-queries) is proposedin [24], which directly maps the multi-dimensional data space toa d-dimensional torus and each dimension of the torus is responsi-ble for one attribute domain.

SkipTree is a distributed data structure for storing data withmulti-dimensional keys in a P2P network [25]. It uses a distributedpartition tree to split the space into smaller regions. Each peer inthe network becomes a leaf node of that tree and is responsiblefor one of the regions. The partition tree is used for defining the or-der among the regions, whereas the routing mechanism is similarto that of SkipNet. For an SkipTree in its ideal form, each peerstores 2logN � 1 pointers to other peers.

Skip tree graphs constitute an extension of skip graphs based onskip trees [26]. In order to improve query efficiency, each level h ofa skip tree graph contains the same information of both level h andlevel h-1 in a skip graph. Each node stores the information ofneighbor nodes and a set of conjugate nodes at each level. So skiptree graph is not a space-efficient overlay structure compared withskip graphs and SkipNet.

P-Grid is a P2P lookup system based on a virtual binary searchtree [27]. It partitions the single-dimensional space with peers inthe network. Each peer represents a leave of the tree and its posi-tion is determined by its path (a binary bit string from the root tothe leave, which represents the subset of the tree’s overall informa-tion). P-Grid’s prefix-routing infrastructure enables it to remain

logarithmic search cost for a sufficiently randomized selection oflinks to other peers in the routing tables. But in the worst case(degenerated data key distributions), the tree shape no longer pro-vides an upper bound for search cost as it might be up to lineardepth in network size. For fault-tolerance, multiple peers can beresponsible for the same path.

G-Grid is a solution proposed for the multi-dimensional data,which organizes a set of objects across any number of peers in anetwork [28]. G-Grid partitions the space of objects, based on theattribute values, into regions and structures these regions into abinary tree. A node of the tree represents a region in the multi-dimensional data space. One or more regions are assigned to onepeer. As a result, one or more nodes of the G-Grid tree structureare associated with a peer in the network. Due to its learning capa-bility, which reduces the distances in the system by creating newlinks, the search cost in G-Grid is less than O(logN). In the best case,the average search cost is equal to or less than 2 hops. However, asa peer interacts with other peers, its local routing table grows in or-der to find the most efficient route to the target objects. Therefore,memory requirement might be up to linear growth in the worstcase.

The comparison of main features of these P2P overlays men-tioned above is listed in Table 1, where N is the number of peersin different overlay networks, K is the number of clusters in Skip-Cluster and m is the total number of layers in SkipMard. All of theseoverlays support range queries, but their basic topologies, routinghops and memory requirement might be different. Traditionaloverlays (such as Skip lists, SkipNet, etc.) achieve O(logN) routinghops with high probability, depending on the distribution of mem-bership vectors. G-Grid notably improves the search performance toless than 2 hops in the best case and O(logN) in the worst case. As atradeoff, it consumes O(logN) memory only in the best case. In theworst case, memory requirement will grow linearly with the net-work size. Routing in SkipCluster is less than O(logN) hops. To whatextent the routing cost is reduced depends on the usage of TLL infinding the longest prefix matching. Unlike G-Grid, SkipCluster isa space-efficient overlay that only consumes O(logN) memory inthe worst case when all peers are super-peers in every cluster. Ina standard two-tier SkipCluster, memory consumption is less thanO(logN) because most intra-cluster peers consume only O log N

K

� �

memory space.

3. SkipCluster structure

In this section we present SkipCluster, a novel hierarchical over-lay structure for P2P networks that preserves the order of peer idsand stored keys without consumption of extra memory space, andefficiently supports range queries.

3.1. Overlay network architecture

SkipCluster overlay network is built on a two-tier hierarchicalarchitecture, where the higher tier consists of super-peers repre-senting different clusters and the lower tier consists of peers inclusters. Each cluster has a super-peer, which is the peer withthe smallest peer id in the cluster. The super-peer of a cluster canforward queries from one cluster to another or inside of the localcluster. In a cluster, cluster id is the prefix of the correspondingsuper-peer’s peer id and cluster size is the maximum number ofpeers in a cluster, which is determined by the length of suffix ofthe corresponding super-peer’s peer id. If cluster size decreases to1, SkipCluster will become a one tier P2P network similar to Skip-Net. All clusters have the same cluster size and different peer idsmay have different length in a SkipCluster. So some cluster idsmay be the prefixes of others. Given a SkipCluster with K clusters

Table 1The comparison of different P2P overlays that can support range queries.

P2P Overlays Basic topologies Multi-dimensionaldata support

Multi-attributedata support

Routing hops Memory requirement

Chord#(SONAR) Ring (multi-dimensionaltorus)

No (Yes) No (Yes) Proven O(logN) O(logN)

G-Grid Binary tree Yes Yes O(logN) in the worst case; <2in the best case

O(logN) in the best case; linear growthwith network size in the worst case

P-Grid Virtual binary search tree No No O(logN) in the best case; linear growthwith network size in the worst case

O(logN)

SkipCluster Two-tier rings Yes Yes Less than O(logN) O(logN) for super-peer; O(logNK) for

intra-cluster peerSkip graphs Doubly-linked lists No No O(logN) with high probability O(logN)Skip lists Doubly-linked lists No No O(logN) with high probability O(logN)SkipMard Doubly-linked lists No Yes O(logN) with high probability O(mlogN)SkipNet Doubly-linked rings No No O(logN) with high probability O(logN)SkipTree Distributed partition tree Yes No O(logN) O(logN � loglogN) on average O(log2N)

in the worst caseSkip tree

graphsDistributed search treewith conjugate nodes

No No O(logN) on average More than O(logN)

Fig. 2. Peer organization within a cluster (cluster size=16).

M. Xu et al. / Computer Communications 34 (2011) 862–874 865

and 2d peers in each cluster, the total number of peers N in overlaynetwork is K � 2d, where d is the length of suffix to determine thecluster size. Each cluster’s random choice of ring membershipscan be encoded as a unique binary number, which we refer to asthe cluster’s membership vector. SkipCluster does not require theuse of hashing (such as SHA-1) to generate clusters’ membershipvector; we only require that membership vectors are random andunique. The distribution of membership vectors is not necessaryto be uniform, but a uniform distribution will provide better rout-ing performance, as in SkipNet.

To reason about this structure formally, we will need somenotation. Let R be a finite alphabet, and let R* be the set of all finitewords consisting of characters in R. We assume thatP = {p1,p2, . . . ,pN} is the set of all peer ids and C = {c1,c2, . . . ,cK} isthe set of all cluster ids in SkipCluster (pi – pj and ci – cj if i – j).Let us represent individual characters of a word w with subscripts.Then w is equal to . . .w2w1w0, where w0 is the lowest bit in w. Thelength of w is denoted by jwj.

Definition 1. Given a SkipCluster with cluster size of 2d, two peerspi and pj belong to the same cluster if and only if jpij = jpjj andpm

i ¼ pmj for all m P d P 0. pi will become a super-peer if and only

if pd�1i . . . p1

i p0i is the smallest suffix in a cluster.

As described in Definition 1, each SkipCluster has a predefinedcluster size of 2d. Given a peer id, the lowest d-bit is used to dis-criminate this peer from other peers in the same cluster. Theremaining higher bits become the cluster id of this peer. The firstjoining peer in a cluster will be the super-peer of this cluster. Afterthat, each joining peer will compare its lowest d-bit with that ofthe current super-peer. The peer with smaller suffix will becomethe new super-peer of that cluster.

Fig. 2 shows an example of peer organization within a cluster.Each peer is denoted by a rectangle and the label value in a rectan-gle denotes the peer id of that peer. The cluster size is 16( = 24),which means that at most 16 peers can be contained in this cluster,but only 10 peers are illustrated in this figure. Peer 100000 is thesuper-peer because the lowest 4 bits of 100000 is the smallest suf-fix in this cluster (it is equivalent to say that peer 100000 has thesmallest peer id). All peers have the same prefix 10, which is thecluster id of this cluster. Routing table and query routing detailswill be described in the last part of Section 3.2.

Definition 2. membership vector m(ci) is a finite random word overR and jcij = jm(ci)j. If ci – cj, then m(ci) – m(cj), where 1 6 i 6 K and1 6 j 6 K.

Fig. 3 illustrates the higher tier infrastructure of a SkipClusterwith 8 clusters (cluster size = 8). All participants are super-peersin this tier. The label value on the upper left of a peer denotes thecluster id of that cluster and the label value on the upper right of apeer denotes the membership vector of that cluster. The key ideawe take from SkipNet is the notion of maintaining a series of over-lapping rings at different levels. All super-peers are connected in se-quence according to their cluster ids at level 0. To simplify therepresentation, we assume R = {0,1} in the rest of this paper.

The prefix of a membership vector decides the position of thecorresponding cluster at different levels. The split rule of SkipClus-ter is a little bit different from that of SkipNet because differentmembership vectors may have different length in SkipCluster. If aring at level h has two or more super-peers, it must be split intotwo rings at level h+1. So all super-peers will be split into single-tons after O(logK) levels on average, and each super-peer stores2logK pointers in the higher tier, where K is the number of clustersin the overlay network.

3.2. Routing table and query routing

Each peer’s set of pointers is called its routing table, since thepointers are used to route messages between peers. Different

Fig. 3. The higher tier infrastructure of a SkipCluster with 8 clusters (clustersize = 8).

Fig. 4. A binary trie structure for storing five strings.

Fig. 5. The data structure of a TLL node and the data organization of a TLL.

866 M. Xu et al. / Computer Communications 34 (2011) 862–874

overlay networks may have different routing tables and use differ-ent routing algorithms because of their diversities in structure.

In this paper, we design a novel data structure called TripleLinked List (simply TLL) to store each super-peer’s pointers in thehigher tier, which allows for speeding up query routing in Skip-Cluster. In fact, TLL is used to find the longest prefix in our queryrouting algorithm.

Longest Prefix Matching (LPM) is a fundamental part of IP ad-dress lookup protocol. It is also a major bottleneck in high perfor-mance routers because of increasing routing table sizes, higherspeed links and the imminent migration to 128 bits IPv6 addresses.A large number of research papers address this problem in the lastdecade and evaluate their algorithms on some metrics, such aslookup time, update time and memory usage.

Trie is an ordered tree data structure for storing strings [29]. In atrie, a branch in each node corresponds to a character in R. A stringis represented by a path from the root to a leaf node or an innernode. All the descendants of a node have a common prefix of thestring associated with that node, and the root is associated withthe empty string. Fig. 4 shows a simple binary trie structure forstoring five strings, which are represented by black circles. To findthe LPM of a given search string, the trie is traversed from the root.According to the value of the next bit in the search string, eitherthe left or right branch is followed. The most recent visited prefixstring will be remembered. When the search string is exhaustedor a nonexistent branch is selected, the remembered prefix stringis returned as the best match. The main problem with the triestructure is that it needs to keep a large number of extra nodesin the index tree (represented by white circles in the figure). Inaddition, the search depth from the root to a leaf may be long. Ina trie, IP address lookup can be performed in O(W) time and usinga memory space of O(MW) with O(W) update time, where M is thenumber of prefixes in the routing table, and W is the length of IPaddress. Trie is not a space-efficient structure for the LPM becausebackbone routers today can have databases of up to tens of thou-sands or even hundreds of thousands of prefixes.

Several other algorithms with attractive properties are notbased on tries. Lampson et al. presented the Multiway and Multi-column Search (MMS), which requires O(W + logM) time andO(2M) memory [30]. Sangireddy et al. [31] proposed two algo-rithms – Elevator-Stairs algorithm and logW-Elevators algorithm,for fast lookups in large routing tables. The Elevator-Stairs algo-rithm provides a lookup time of OðWk þ kÞ, update time ofOðWk þ kþ 2kÞ and requires only O(M) memory, where M is thenumber of prefixes in the routing table, W is IP address lengthand k(1 6 k 6W) is an adjustable parameter to be optimized. ThelogW-Elevators algorithm performs lookup in O(logW) time, updatein O(M) time and requires O(MlogW) memory.

Our TLL-based LPM algorithm requires O(M) memory space,O(M) update time and O(M) lookup time in the worst case.Although O(M) lookup time is unendurable in recent IP addresslookup protocol, it is applicable in SkipCluster where M equals to2log N

2d and W can be assigned a big value, which is favorable foravoiding cluster id collision.

Definition 3. Assume there are two binary strings si ¼ s1i s2

i . . . smi

and sj ¼ s1j s2

j . . . snj , where si 2 R* and sj 2 R*. Then,

1. if s1i ¼ s1

j ; s2i ¼ s2

j ; . . . ; sk�1i ¼ sk�1

j ; ski < sk

j , where 1 6 k 6m and1 6 k 6 n, then si � sj; if si = /, then for all sj – /,si � sj.

2. if m = n and s1i ¼ s1

j ; s2i ¼ s2

j ; . . . ; smi ¼ sn

j , then si � sj;3. if m < n and s1

i ¼ s1j ; s

2i ¼ s2

j ; . . . ; smi ¼ sm

j ; smþ1j ¼ 0, then si‘sj;

4. if m < n and s1i ¼ s1

j ; s2i ¼ s2

j ; . . . ; smi ¼ sm

j ; smþ1j ¼ 1, then siasj;

5. if si‘sj or siasj, then si is a prefix of sj.

Fig. 5 illustrates the data structure of a TLL node and the dataorganization of a TLL corresponding to Fig. 4. A TLL consists ofone or more TLL nodes. There are four fields in a TLL node, wheredata field is used to store cluster id and next, leftdown and rightdownfields are used to refer to other TLL nodes. Each super-peer’s rout-ing table is organized into a TLL, where each TLL node correspondsto a pointer in a traditional routing table.

Definition 4. Assume there are two TLL nodes n1 and n2 in theoverlay network. Then,

1. if n1.data � n2.data, then n1.next = n2;2. if n1.data ‘ n2.data, then n1.leftdown = n2;3. if n1.data a n2.data, then n1.rightdown = n2.

M. Xu et al. / Computer Communications 34 (2011) 862–874 867

Algorithm 1 TLL-based LPM Algorithm

1: public class TripleLinkedList{2: protected TLLNode root = new TLLNode ();//head node3: protected TLLNode prevCursor, nextCursor,lpmMarker;4: public TLLNode findLPM (TLLNode mNode){5: if (root.next == null) then6: return null;7: else8: prevCursor = root;9: nextCursor = root.next;

10: while (nextCursor – null) do11: if (nextCursor.data � mNode.data) then12: Record nextCursor’s position in prevCursor;13: Move nextCursor to the next direction (?);14: else if (nextCursor.data ‘ mNode.data) then15: Record nextCursor’s position in prevCursor and

lpmMarker;16: Move nextCursor to the leftdown direction (.);17: else if(nextCursor.data a mNode.data) then18: Record nextCursor’s position in prevCursor and

lpmMarker;19: Move nextCursor to the rightdown direction (&);20: else21: break;22: end if23: end while24: return lpmMarker;25: end if26: }27:}

Algorithm 1 presents our TLL-based LPM algorithm in detail. Theinformation of a pointer is stored in the data field of a TLL node. Theroot of a TLL is a head node that facilitates some basic data opera-tions, such as node searching, insertion and deletion. It does notstore the information of any pointer. prevCursor and nextCursorare used to travel the TLL. The formal parameter mNode defined inline 4 will be assigned value from a given node that is to be pro-cessed. The candidate node with the possible longest prefix of the gi-ven node is recorded by lpmMarker. If the TLL is not empty,nextCursor will point to the first node of the TLL. After that, nextCur-sor compares the node id in its data field with the node id in the givennode’s data field and determines its traveling directions (next, left-down, rightdown) as described from line 12 to line 25 until the targetnode is found. All operations on the TLL are local. TLL is not respon-sible for finding neighbors (which belongs to remote operations) ofeach peer. It only stores the existing pointers into a triple linked listfor finding the longest prefix of a given searching string.

Algorithm 2 describes the node insertion algorithm. The formalparameter newNode defined in line 1 will be assigned value from agiven node that is to be inserted into a TLL. According to Defini-tions 3 and 4, if newNode is the first node in the TLL, the next fieldof root node will point to it, which is described from line 2 to line 4.Otherwise, prevCursor and nextCursor defined in line 6 and line 7will be used to travel the TLL to find the right position for insertionas described from line 9 to line 19. Altogether, there are three casesof node insertion, discriminated by newNode’s direction to next-Cursor. Fig. 6 illustrates the three cases of node insertion corre-sponding to the code in Algorithm 2 from line 9 to line 19. A TLLis built from a root node. At each step, only one TLL node will beinserted into the TLL and its position in the TLL is decided by thedata field of the node. Therefore, all nodes can be inserted intothe right positions with any input order.

Algorithm 2 Node Insertion Algorithm

1: public boolean insertNode (TLLNode newNode){2: if (root.next == null) then3: root.next == newNode;4: return true;5: else6: prevCursor = root;7: nextCursor = root.next;8: Use prevCursor and nextCursor to travel the TLL and

locate newNode’s position;//Get newNode’s direction to nextCursor

9: switch (direction)10: case (next)11: newNode.next = nextCursor;12: prevCursor.next = newNode;13: case (leftdown)14: newNode.leftdown = nextCursor;15: prevCursor.leftdown = newNode;16: case (rightdown)17: newNode.rightdown = nextCursor;18: prevCursor.rightdown = newNode;19: end switch20: return true;21: endif22:}

Algorithm 3 Node Deletion Algorithm

1:public boolean deleteNode (TLLNode dNode){2:if (root.next == null)3: return false;4:else5: prevCursor = root;6: nextCursor = root.next;7: Use prevCursor and nextCursor to travel the TLL until

dNode’s position is located by nextCursor;8: Get prevCursor’s direction to nextCursor;9: if (nextCursor.data � dNode.data) then

10: if (prevCursor.next == nextCursor) then11: if (nextCursor.leftdown – null) then12: prevCursor.next=nextCursor.leftdown;13: end if14: else if (prevCursor.leftdown == nextCursor) then15: if (nextCursor.leftdown – null) then16: prevCursor.leftdown=nextCursor.leftdown;17: end if18: else if (prevCursor.rightdown == nextCursor) then19: if (nextCursor.leftdown – null) then20: prevCursor.rightdown = nextCursor.leftdown;21: end if22: prevCursor = nextCursor.leftdown;23: if (nextCursor.rightdown – null) then24: prevCursor.next = nextCursor.rightdown;25: prevCursor = nextCursor.rightdown;26: end if27: prevCursor.next = nextCursor.next;28: return true;29: end if30: end if31:end if32:}

Algorithm 3 presents the node deletion algorithm. The formal

parameter dNode defined in line 1 will be assigned value from a gi-

Fig. 6. Three cases of node insertion.

868 M. Xu et al. / Computer Communications 34 (2011) 862–874

ven node that is to be deleted from a TLL. If the TLL is empty, thealgorithm will return false. Otherwise, prevCursor and nextCursordefined in line 5 and line 6 will be used to travel the TLL to findthe target node to be deleted. If the target node is found, it willbe deleted from the TLL as described from line 9 to line 30. Also,there are three cases of node deletion according to prevCursor’sdirection to nextCursor as illustrated in Fig. 7. After the target nodeis located by nextCursor, the deletion operation is processed on the

Fig. 7. Three cases o

leftdown, rightdown and next fields of nextCursor in order if theyare not null. After that, all pointers denoted by dashed arrows willbe disposed safely.

Fig. 8 shows the TLL after insertion of two nodes 10 and 100 tothe TLL in Fig. 5. For example, when node 10 arrives, its data fieldwill be compared with that of node 01. As ‘‘01” � ‘‘10”, the datafield of node 10 will be compare with that of node 1010. As ‘‘10”a ‘‘1010”, the rightdown pointer of node 10 will refer to node

f node deletion.

Fig. 8. The TLL after the insertions of two nodes 10 and 100.

Fig. 9. The TLL after the deletion of node 10.

M. Xu et al. / Computer Communications 34 (2011) 862–874 869

1010. Likewise, the leftdown pointer of node 10 will refer to node100 after its arrival because ‘‘10” ‘ ‘‘100”. Fig. 9 shows the TLL afterdeletion of node 10 from the TLL in Fig. 8. Firstly, prevCursor andnextCursor travel the TLL and locate the node 10. Secondly, node100 and node 1010 relocate their new positions in the TLL. Finally,node 10 is removed from the TLL safely.

TLL can speed up message routing in the higher tier of SkipClus-ter. For example, suppose a query request for searching peer1000010 is sent to peer 10000 as shown in Fig. 3 (the underlinedpart of a peer id denotes a cluster id). We assume the last three bitsof peer id are used to decide cluster size and m(1000) = 1101,m(10) = 01. Then peer 10000 will try to find the longest prefix ofnode 1101 in its routing table, which is organized in a TLL as shownin Fig. 5. As ‘‘110” is the longest prefix of ‘‘1101”, the query mes-sage is forwarded to peer 100010, whose membership vector is110. After that, peer 100010 will send the query message to itsneighbor at level 2, which is the destination peer 1000010 whosemembership vector is 1101. Compared with SkipCluster, the routingoperation in SkipNet begins by examining peers in the level 0 ringuntil a peer is found whose membership vector matches the desti-nation membership vector (1101) in the first digit. Then the routingoperation will examine peers in the level 1 ring until a peer isfound whose membership vector matches 1101 in the second digit.This procedure repeats until not any more progress can be made.Since each peer has a clockwise neighbor and a counter-clockwiseneighbor at each level of the ring, the query message may not besent to a peer whose membership vector is closer to the destinationmembership vector. The number of message hops is O(logN) withhigh probability in SkipNet, where N is the number of peers. Withthe help of TLL stored at each super-peer, SkipCluster can find amore suitable neighbor if there exists one and skip some levelsduring the query routing process.

TLL is an organized routing table that contains the same numberof pointers in the higher tier as those in SkipNet or skip graphs ifthe lengths of their membership vectors are same. Therefore, TLLwill not consume extra memory space, but can speed up queryrouting because of its capability of finding the longest prefix of agiven string. The additional lookup time and update time for main-taining a TLL is negligibly small compared with the time of trans-mitting query messages in the network. Here, lookup time andupdate time only refer to the data operations on a TLL. Specificallyspeaking, lookup time refers to finding the longest prefix of a givensearching string from the TLL of a peer (not the neighbors of thispeer), and update time refers to inserting or deleting a node fromthe TLL after the content of this node (corresponding to a pointerin a common routing table) is confirmed. Of course, peer joiningand leaving the network will incur much more time consumption

compared with that of data operations on a TLL, because the formerneeds the execution of remote operations such as message trans-mission from one peer to another, and the latter needs only localoperations in a peer.

Query routing tables at the lower tier of a SkipCluster are totallydifferent because all peers in a cluster have the same cluster id. Asshown in Fig. 2, there are 16 rectangles arranged in a closed ringand each rectangle corresponding to a position that is occupied(with peer id) or to be occupied (without peer id) by a peer. Allpeers are arranged in sequence according to their peer ids. Thereare two kinds of pointers–forward pointers and backward pointers,which are denoted by arrowhead lines. Each forward pointer in apeer only points to a peer that has bigger peer id than the local peerid, and each backward pointer in a peer only points to a peer thathas smaller peer id than the local peer id. The super-peer of a clustercontains only forward pointers (denoted by blue arrowhead lines)because it has the smallest peer id, and the peer with the biggestpeer id contains only backward pointers (denoted by green arrow-head lines). Other internal peers contain two kinds of pointers. Apeer’s forward pointers are obtained in this way: firstly, it will di-vide the peers that have bigger peer ids into several segments withexponential growth of sizes; secondly, all peers that have the big-gest peer ids in each segment will be selected as its neighbors. Forexample, the super-peer 100000 divides the 15 peers in the clusterthat meet conditions into 4 segments, which are denoted by differ-ent colors in Fig. 2. The first segment has only 1 peer, the secondsegment can have 2 peers at most, and the fourth segment canhave 8 peers at most. Likewise, we can obtain backward pointersof a peer by choosing the peers that have the smallest peer ids ineach segment. As a result, each intra-cluster peer has a backwardpointer that points to the super-peer and a forward pointer thatpoints to the peer with the biggest peer id of the local cluster. Eachsegment-h pointer traverses nearly 2h peers if and only if not allpeers in a segment leave the network.

After a query message is sent to a super-peer, the super-peerwill forward it to an intra-cluster peer that has the closest peer idto the destination (not exceeds it) by a forward pointer. The peerthat receives the query message will forward it to another peerin the same way until reaching the destination. On the other hand,if an intra-cluster peer wants to send a query message to a peerthat has smaller peer id than itself in the same cluster, it will usebackward pointers. To find a peer that belongs to another cluster,the query message must be sent to the super-peer of the local clus-ter first. After that, the super-peer will execute query routing at thehigher tier as we discussed before.

The super-peers in SkipCluster play a less important role thanits commonly defined usage. They need neither statistical informa-tion of other peers nor any special maintenance. The difference be-tween super-peers and the other intra-cluster peers in SkipClusteris the pointers of routing table. Super-peers have pointers in boththe higher tier and the lower tier, while the other intra-clusterpeers have only pointers in the lower tier for routing within localclusters.

3.3. Peer joining and leaving

To join a SkipCluster, a new peer must execute query routing atthe higher tier according to its membership vector. If it is the firstpeer of the cluster it belongs to, the joining process will be thesame as that of the SkipNet and this peer will be the super-peerof that cluster. If the new peer is not the first peer but its peer idis smaller than other peers in the same cluster, the peer with thesecond smallest peer id will transfer its routing table of the highertier to the new peer and the new peer will become the super-peerof that cluster. After that, the super-peer will update all intra-clus-ter peers’ routing tables. If the new peer is an internal peer of a

870 M. Xu et al. / Computer Communications 34 (2011) 862–874

cluster, it will first find its cluster and then the super-peer of itscluster will update all intra-cluster peers’ routing tables. For exam-ple, suppose to insert the peer 101001 into a cluster of the Skip-Cluster shown in Fig. 2. The process of finding cluster 10 inSkipCluster is the same as finding a node with numeric ID 10 inSkipNet. After super-peer 100000 is found, it will forward theinsertion message to peer 100110 that has the closest peer id topeer 101001 but not exceeds it. In the same way, peer 100110 willforward the insertion message to peer 101000. And then, peer101000 concludes that peer 101001 should be its new neighbor be-cause even its pointer that points the nearest distance to peer101010, peer 101010 also has a bigger peer id than peer 101001.Of course, peer 101010 will become another neighbor of peer101001. After that, all peers in cluster 10 will check their routingtables for possible update.

If an existing peer wants to leave a SkipCluster and it is the onlypeer of a cluster, the leaving process will be the same as that of theSkipNet. If the leaving peer is the super-peer of a cluster and thereare two or more peers in that cluster, it will transfer its routing ta-ble of the higher tier to a neighbor peer with the second smallestpeer id in the cluster, which will become the new super-peer. Afterthat, the new super-peer will update all intra-cluster peers’ routingtables. If the leaving peer is an internal peer of a cluster, the super-peer of that cluster will update all intra-cluster peers’ routingtables.

If the leaving peer crashes and it is not the super-peer of a clus-ter, the current leaving process will be temporarily suspended untilthe crashed peer has been detected by one of its neighbors. For in-stance, the crashed peer may be detected by a neighbor that sendsor forwards a query message to it. After that, this neighbor notifiesthe super-peer of that cluster to update routing tables of the corre-sponding peers. Under this circumstance, the update of routing ta-bles only happens to the peers in the local cluster. Otherwise, if theleaving peer crashes and it is exactly the super-peer of a cluster,the peer with the second smallest peer id in this cluster becomesthe new super-peer and builds its routing table both in the highertier and the lower tier, i.e., searching its neighbors in the local clus-ter as well as in other clusters. Specifically speaking, if the crashedsuper-peer is detected by a neighbor from the same cluster, thisneighbor searches the peer with the second smallest peer id in thatcluster using backward pointers. During the process of recovery,the new super-peer candidate searches its neighbors both in the lo-cal cluster and other clusters. If the new super-peer candidatecrashes while the recovery is under going, the whole recovery pro-cess has to be stopped until it has been detected by a neighborfrom the same cluster. Likewise, this neighbor searches the peerwith the third smallest peer id in that cluster and this peer initial-izes a new recovery process. At the same time, the former recoveryprocess is discarded.

In order to join or leave a network, SkipNet takes O(logN) look-up time and O(2logN) update time (two neighbors update theirrouting tables at each level of ring in the worst case), while Skip-Cluster takes O(logK + d) lookup time and O(2d) update time foran internal peer or O(2logK) update time for the first peer of thecluster, where N=K � 2d as defined in Section 3.1. If the super-peerof a cluster fails, it will take O(logK + d) lookup time andO(2logK + 2d) update time for recovery.

3.4. Range queries processing

Compared to exact-matching queries, range queries are morecomplex and more popular in practice. Actually, exact-matchingis a special form of range query. And range queries processing isthe basis of processing other more complex queries such as kNN.In what follows, we present the method of range queries process-ing in SkipCluster.

Considering that SkipCluster preserves the order of peer ids andstored keys both in the higher tier and the lower tier, it is straight-forward to carry out one-dimensional range queries in SkipClusetr.Here, we omit the detail of one-dimensional range queries process-ing. Instead, we focus on multi-dimensional range queriesprocessing.

In order to deal with multi-dimensional data, we use Z-order (aspace-filling curve first proposed by Morton [20] to map multi-dimensional data to one-dimension due to its good localitypreserving behavior. The Z-value of a point in multi-dimension spaceis simply calculated by bitwise interleaving its coordinate values.

Schrack [32] introduced algorithms for direct arithmetic on di-lated integers. Adams and Wise [33] presented masked integersthat allows Morton-order to support row and column traversalof elements. Raman and Wise [34] proposed efficient casting con-versions for converting ordinary integers to and from dilated inte-gers of two-dimensional and three-dimensional arrays. Thefollowing definitions are derived from [33–35] with a little modi-fication (from N-order to Z-order) for two-dimensional arrays.

Definition 5. Let m ¼Px�1

k¼0 mk2k be interpreted as a constantmask of a x-bit computer word; a kth bit whose mk = 0 is excluded,and a kth bit whose mk = 1 is included in the mask.

Definition 6. The even-dilated representation of j ¼Px�1

k¼0 jk2k is

|! ¼

Px�1k¼0 jk4k and the odd-dilated representation of i ¼

Px�1k¼0 ik2k

is ı ¼Px�1

k¼0 2ik4k.The arrows suggest the justification of the meaningful bits in

even/odd dilated representation. For x = 32, 0�55555555 is themask that covers the even bits and 0�aaaaaaaa is the mask thatcovers the odd bits. Then the Z-value for the hi, jith element of amatrix is calculated as ı

_ |!, or ı

þ |!

[35].For example, if the row index is i ¼

Px=2�1k¼0 ik2k and the column

index is j ¼Px=2�1

k¼0 jk2k in a matrix, then the corresponding Z-value

of Ai,j isPx=2�1

k¼0 ð2ik4k þ jk4kÞ.Fig. 10 illustrates how to convert two-dimensional integers to

one-dimensional integers and map matrices into SkipCluster’sclusters. The two integer coordinates are represented in binaryand Z-values are represented in decimal for easy recognition. Eachsquare with dashed lines denotes a matrix containing continuousintegers that can be mapped into a cluster in SkipCluster.Fig. 10(a) shows a two-dimensional space of Z-order with integercoordinates 0 6 i 6 7 and 0 6 j 6 7 that consists of 2 � 2 matrices.The Z-value h7,4i can be calculated as:

111 þ100

!¼ 1010102 þ 0100002 ¼ 1110102 ¼ 58;

where subscript 2 denotes binary number. Fig. 10(b) shows row-major order of 4 � 4 matrices with different masked integers de-rived from Z-order.

Likewise, we can also map d-dimensional data to one dimensionwith d-dilation.

4. Performance evaluation

We use PeerSim [36] as simulation framework with event-dri-ven model to evaluate our approach. PeerSim is an open source,Java based, P2P simulation framework developed for large-scaleand dynamic environments.

4.1. Experimental results

We evaluated SkipClusters with different cluster sizes: 1, 4, 16,64, and compared them with SkipNet. The number of peers in

(a) Z-order of 2 × 2 matrices (b) Row-major order of 4 × 4 matrices

Fig. 10. Converting two-dimensional integers to one-dimensional integers and mapping matrices into SkipCluster’s clusters.

M. Xu et al. / Computer Communications 34 (2011) 862–874 871

the network was varied from 1024 to 65536. We used SHA-1 hashfunction to generate random and unique membership vectors.

In the first set of experiments, we evaluated the performance offive LPM algorithms on SkipCluster-1 (a SkipCluster with only onepeer in each cluster) in different network sizes. Fig. 11 illustratesthe comparison of memory cost, average lookup time and averageupdate time of five LPM algorithms. We employed the number ofvisited times to the nodes of the corresponding data structures indifferent LPM algorithms to represent the lookup time and updatetime. Thus, it is not the real time cost, but an indirect measure oftime cost. As shown in Fig. 11(a), Trie consumed the most memoryspace among the five LPM algorithms. TLL-based LPM algorithmconsumed the least memory space as well as Elevator-Stairs algo-rithm. Fig. 11(b) illustrates the lookup time of five LPM algorithms.MMS performed worst among the five LPM algorithms. TLL-baseLPM algorithm performed medium but was easily affected with

Fig. 11. The performance of five LPM algo

the increment of network sizes. Fig. 11(c) illustrates the updatetime of five LPM algorithms. The update time of Elevator-Stairsalgorithm was not affected by the variations in network sizes,which performed better than others in big size networks.

In the second set of experiments, we compared the performanceof SkipClusters with SkipNet under different conditions. Fig. 12plots the comparison of total routing table size with the numberof peers. SkipCluster-1 contained the same number of pointers asSkipNet. SkipCluster-16 contained the smallest routing tableamong them. When the network size reached 65536, SkipClus-ter-16 totally contained 16.7% of routing table size than that ofSkipNet.

In order to evaluate the performance of exact-match queries,10000 query messages were sent from random sources to randomdestinations in each experiment. Total message hops was used toevaluate the overall query cost and overlay network building cost.

rithms in networks of different sizes.

Fig. 12. The total size of routing table vs. the number of peers. Fig. 14. The cost of building different overlay networks of different sizes.

872 M. Xu et al. / Computer Communications 34 (2011) 862–874

Fig. 13 illustrates the query cost of exact-match queries in differentnetwork sizes. In SkipCluster-1, the number of message hops wasreduced by 7.8% on average compared with SkipNet. With theincrement of network size, SkipCluster-1 obtained more advanta-ges than SkipNet. SkipCluster-16 achieved 20.8% query cost offthan SkipNet and SkipCluster-64 performed best in all five net-works with a reduction of 23.9% query cost than SkipNet.

Fig. 14 plots the cost of building different overlay networks indifferent sizes. SkipCluster-1 outperformed other overlay networkswith SkipNet closely behind it. Compared with SkipNet, SkipClus-ter-4 took more than 22.8% building cost on average and SkipClus-ter-16 took more than 30.3% to build full size networks.SkipCluster-64 paid far more cost than others, especially in thecondition of large network sizes. When the network size reached65536, SkipCluster-64 approximately took 3.5 times of buildingcost than that of SkipNet.

In order to evaluate the repair cost under peer failure, the num-ber of failed peers was roughly set to 15% of the total peers in thenetwork where there was 65536 peers. Fig. 15 illustrates the com-parison of repair cost with the fraction of super-peers among thefailed peers. The repair cost of SkipCluster-1 was equal to that ofSkipNet and it was not affected by the fraction of super-peers be-cause all peers were super-peers in SkipCluster-1. When the frac-tion of super-peers increased, SkipCluster-4, SkipCluster-16 and

Fig. 13. Query cost of exact-match queries in networks of different sizes.

SkipCluster-64 needed more and more repair cost. SkipCluster-4required the lowest repair cost among all of them. SkipCluster-64needed about 2.7 times of repair cost than that of SkipCluster-4.

Since SkipNet and SkipCluster preserve the order of peer ids andstored keys. Performing range queries in them is equivalent to useexact-match queries to lookup boundary values, and execute queryrouting along the corresponding ring segments. So the perfor-mance of range queries in SkipNet and SkipCluster are mainlydetermined by exact-match queries. To evaluate the inter-clusterrouting cost of range queries, we generated uniformly distributedsquare regions in a two-dimensional Z-order space. The networksize was set to 16384 and the side length of range query squarewas varied from 2 to 64. Altogether 100 range queries were issuedfrom random sources to random destinations in each experiment.Fig. 16 illustrates the comparison of side length of range querysquare with average number of visited clusters. Each peer was ta-ken as a cluster in SkipNet. SkipCluster-64 performed best as weexpected because it has the biggest cluster size and each clustercontains peers with continuous peer ids mapped from a matrix intwo-dimensional Z-order space. For example, when side length ofrange query square was 32, SkipCluster-16 visited 42.2% of clustersand SkipCluster-64 visited only 19.1% of clusters compared withSkipNet. In real P2P systems, we can use Time-To-Live (TTL) value

Fig. 15. Repair cost vs. the fraction of super-peers.

Fig. 16. Side length of range query square vs. the average number of visitedclusters.

M. Xu et al. / Computer Communications 34 (2011) 862–874 873

to calculate physical hops and put physically proximate peers inthe same cluster.

4.2. Discussions

The performance of SkipCluster varies with cluster size. Com-pared with SkipNet, SkipCluster-1 cuts off 7.8% of the query costwith the same routing table size and consumes 93.4% of buildingcost on average. So it is suitable for a highly dynamic network.SkipCluster-4 and SkipCluster-16 perform better in reducingquery cost than SkipCluster-1 does, at the expense of increasingbuilding cost. Moreover, SkipCluster-16 is the most space efficientoverlay structure. It can reduce 83% of the routing table size and20.8% of the query cost, at the expense of 30.3% of extra buildingcost on average. Accordingly, SkipCluster-16 is a good choice for aless dynamic network. SkipCluster-64 can reduce the query costto the most extent, but it also consumes mostly on buildingand repairing. Therefore, it is most suitable for relatively staticenvironments where peers do not join and leave the networkfrequently.

Only two-dimensional data was generated and converted toone-dimensional data in evaluating the performance of range que-ries. Likewise, three-dimensional data or other multi-dimensionaldata can also be converted to one-dimensional data in Z-order.The only prerequisite is to keep the data in sequence after mappingthem into SkipCluster’s clusters.

In SkipCluster, only the peer with the smallest peer id in a clus-ter is chosen as a super-peer. Peer’s capability and load are not con-sidered as the factors of choosing a super-peer. Moreover, thecluster size (maximum number of peers in a cluster) is predefinedby the length of suffix of peer id. Therefore, the cluster size is a fixedvalue during the runtime of a SkipCluster system.

The most related works to our approach include skip graphs,SkipNet and SkipMard. However, in this paper we compared exper-imentally the performance of SkipCluster with only that of SkipNet,the reasons are as follows:

� Skip graph and SkipNet own similar basic data structures, whichhave been proposed independently. Skip graph is focused onformal characterization of invariants and theoretical results,while SkipNet is concerned with the implementation of realP2P systems. So comparing with SkipNet is equivalent to com-paring with skip graphs.

� SkipMard was proposed to answer multi-attribute queries withmultiple layers. A 2-layer SkipMard (note that a SkipMard con-sists of at least two layers) is actually an instance of skip graphs.SkipCluster is proposed for dealing with one-attribute/dimen-sion range queries without consumption of extra memoryspace. If we must experimentally compare SkipCluster withSkipMard, the experiments should be conducted over one-attri-bute datasets. However, considering that when dealing withone-attribute data, SkipMard (i.e. 2-layer SkipMard) is essen-tially an instance of skip graphs that have essentially similarstructure to that of SkipNet. So it is not very necessary to con-duct experimental comparison between SkipCluster and SkipM-ard after we have extensively compared SkipCluster withSkipNet.

5. Conclusion

In this paper, we proposed a hierarchical P2P overlay networkcalled SkipCluster, which is capable of supporting both exact-match and range queries efficiently without consumption of extramemory space. SkipCluster is derived from skip graphs and Skip-Net, but it has a two-tier hierarchical architecture. We also de-signed a novel data structure called Triple Linked List to storeeach super-peer’s pointers in the higher tier, which can be usedto find the longest prefix and speed up query routing of inter-clus-ter. In the lower tier, the pointers in each intra-cluster peer’s rout-ing table span an exponentially increasing number of peers withhigh possibility. Experimental results show that the performanceof a SkipCluster system varies with its cluster size. For a SkipClus-ter of any cluster size, its query cost is lower than that of a SkipNetbecause of TLL’s capability in finding the longest prefix matching,and its routing table size is equal to or less than that of a SkipNet.A SkipCluster system with bigger cluster size has lower query costbut higher building/repair cost than a SkipCluster system withsmaller cluster size.

Acknowledgements

The authors appreciate the anonymous reviewers for theirinsightful and constructive comments on the manuscript. Thiswork was supported by the National Natural Science Foundationof China (NSFC) under Grants Nos. 60873040 and 60873070, the863 Program under Grant No. 2009AA01Z135, and the Open Fund-ing Program of Shanghai Key Laboratory of Intelligent InformationProcessing under Grant No. IIPL-09-010. Jihong Guan was also sup-ported by the ‘‘Shu Guang Program of Shanghai Municipal Educa-tion Commission and Shanghai Education DevelopmentFoundation.

References

[1] E.K. Lua, J. Crowcroft, M. Pias, R. Sharma, S. Lim, A survey and comparison ofpeer-to-peer network schemes, IEEE Communications Tutorials and Surveys 7(2) (2005) 72–93.

[2] I. Clarke, O. Sandberg, B. Wiley, T.W. Hong, Freenet: a distributed anonymousinformation storage and retrieval system, in: Proceedings of Workshop onDesign Issues in Anonymity and Unobservability, 2000, pp. 311–320.

[3] J. Frankel, T. Pepper, Gnutella, 2000. Available from: <http://www.gnutella.com>.

[4] A. Heinla, P. Kasesalu, J. Tallinn, KaZaA, 2001. Available from: <http://www.kazaa.com>.

[5] A. Heinla, P. Kasesalu, J. Tallinn, Skype, 2003. Available from: <http://www.skype.com>.

[6] I. Stoica, R. Morris, D. Karger, F. Kaashoek, H. Balakrishnan, Chord: A scalablepeer-to-peer lookup service for Internet applications, in: Proceedings of ACMSIGCOMM 2001, pp. 149–160.

[7] S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, A scalable content-addressable network, in: Proceedings of ACM SIGCOMM, 2001, pp. 161–172.

[8] A. Rowstron, P. Druschel, Pastry: scalable, distributed object location androuting for large-scale peer-to-peer systems, in: Proceedings of IFIP/ACM

874 M. Xu et al. / Computer Communications 34 (2011) 862–874

International Conference on Distributed Systems Platforms, 2001, pp. 329–350.

[9] B.Y. Zhao, J. Kubiatowicz, A. Joseph, Tapestry: An Infrastructure for Fault-Tolerant Wide-Area Location and Routing, U.C. Berkeley Technical Report, UCB/CSD-01-1141, 2001.

[10] R. Rodrigues, C. Blake, When multi-hop peer-to-peer lookup matters, in:Proceedings of 3rd International Workshop on Peer-to-Peer Systems (IPTPS’04)and LNCS 3279, 2004, pp. 112–122.

[11] N. Magharei, R. Rejaie, Y. Guo, Mesh or multiple-tree: a comparative study oflive P2P streaming approaches, in: Proceedings of 26th IEEE InternationalConference on Computer Communications, 2007, pp. 1424–1432.

[12] N. Magharei, R. Rejaie, PRIME: peer-to-peer receiver-driven mesh-basedstreaming, IEEE/ACM Transactions on Networking 17 (4) (2009) 1052–1065.

[13] D. Ren, Y.T.H. Li, S.-H. Chan, Fast-Mesh: a low-delay high-bandwidth mesh forpeer-to-peer live streaming, IEEE Transactions on Multimedia 11 (8) (2009)1446–1456.

[14] M. Schlosser, M. Sintek, S. Decker, W. Nejdl, Hypercup-hypercubes, ontologiesand efficient search on p2p networks, in: Proceedings of InternationalWorkshop on Agents and Peer-to-Peer Computing, 2002, pp. 112–124.

[15] E. Anceaume, R. Ludinard, A. Ravoaja, F. Brasileiro, PeerCube: a hypercube-based P2P overlay robust against collusion and churn, in: Proceedings of 2ndIEEE International Conference on Self-Adaptive and Self-Organizing Systems,2008, pp. 15–24.

[16] K. Gummadi, R. Gummadiy, S. Gribblez, S. Ratnasamyx, S. Shenker, I. Stoica,The impact of DHT routing geometry on resilience and proximity, in:Proceedings of ACM SIGCOMM, 2003, pp. 381–394.

[17] J. Aspnes, G. Shah, Skip graphs, in: Proceedings of 14th ACM-SIAM Symposiumon Discrete Algorithms, 2003, pp. 384–393.

[18] N.J.A. Harvey, M.B. Jones, S. Saroiu, M. Theimer, A. Wolman, SkipNet: a scalableoverlay network with practical locality properties, in: Proceedings of 4thUSENIX Symposium on Internet Technologies and Systems, 2003, pp. 113–126.

[19] W. Pugh, Skip lists: a probabilistic alternative to balanced trees,Communications of the ACM 33 (6) (1990) 668–676.

[20] G.M. Morton, A Computer Oriented Geodetic Data Base and a NewTechnique in File Sequencing, Technical Report, IBM Ltd., Ottawa, Ontario,March 1, 1966.

[21] T. He, J. Ni, A.M. Segre, S. Wang, B.M. Knosp, SkipMard: a multi-attributepeer-to-peer resource discovery approach, in: Proceedings of SecondInternational Multi-Symposiums on Computer and Computational Sciences,2007, pp. 266–273.

[22] T. Schütt, F. Schintke, A. Reinefeld, Structured overlay without consistenthashing: empirical results, in: Proceedings of Sixth IEEE InternationalSymposium on Cluster Computing and the Grid Workshops, 2006, p. 8.

[23] T. Schütt, F. Schintke, A. Reinefeld, Scalaris: reliable transactional P2P key/value store, in: Proceedings of Seventh ACM SIGPLAN workshop on Erlang,2008, pp. 41–48.

[24] T. Schütt, F. Schintke, A. Reinefeld, A structured overlay for multi-dimensionalrange queries, in: Proceedings of 13th International Euro-Par Conference onParallel Processing, LNCS 4641, 2007, pp. 503–513.

[25] S. Alaei, M. Ghodsi, M. Toossi, SkipTree: A new scalable distributed datastructure on multidimensional data supporting range-queries, ComputerCommunications 33 (1) (2010) 73–82.

[26] A.G. Beltrán, P. Sage, P. Milligan, Skip tree graph: a distributed and balancedsearch tree for peer-to-peer networks, in: Proceedings of IEEE InternationalConference on Communications, 2007, pp. 1881–1886.

[27] K. Aberer, P. Cudr-Mauroux, A. Datta, Z. Despotovic, M. Hauswirth, M. Punceva,R. Schmidt, P-Grid: a self-organizing structured P2P system, SIGMOD Record32 (3) (2003) 29–33.

[28] A.M. Ouksel, G. Moro, G-Grid: A class of scalable and self-organizing datastructures for multi-dimensional querying and content routing in P2Pnetworks, LNAI 2872 (2004) 123–137.

[29] E. Fredkin, Trie memory, Communications of the ACM 3 (1960) 490–500.[30] B. Lampson, V. Srinivasan, G. Varghese, IP lookups using multiway and

multicolumn search, IEEE/ACM Transaction on Networks 7 (3) (1999) 324–334.

[31] R. Sangireddy, N. Futamura, S. Aluru, A.K. Somani, Scalable, memory efficient,high-speed IP lookup algorithms, IEEE/ACM Transactions on Networking 13 (4)(2005) 802–812.

[32] G. Schrack, Finding neighbors of equal size in linear quadtrees and octrees inconstant time, Computer Vision, Graphics, and Image Processing: ImageUnderstanding 55 (3) (1992) 221–230.

[33] M.D. Adams, D.S. Wise, Fast additions on masked integers, SIGPLAN Notices 41(5) (2006) 39–45.

[34] R. Raman, D.S. Wise, Converting to and from dilated integers, IEEE Transactionson Computers 57 (4) (2008) 567–573.

[35] D.S. Wise, Ahnentafel indexing into morton-ordered arrays, or matrix localityfor free, in: Proceedings of EUROPAR 2000: Parallel Processing, LNCS 1900,2000, pp. 774–783.

[36] G.P. Jesi, PeerSim HOWTO: build a new protocol for the PeerSim 1.0 simulator,2003. Available from: <http://peersim.sourceforge.net>.