Hierarchical architectures in structured peer-to-peer overlay networks

Peer-to-Peer Netw. Appl.DOI 10.1007/s12083-013-0200-z

Hierarchical architectures in structured peer-to-peeroverlay networks

Dmitry Korzun · Andrei Gurtov

Received: 20 September 2011 / Accepted: 11 February 2013© Springer Science+Business Media New York 2013

Abstract Distributed Hash Tables (DHT) are presentlyused in several large-scale systems in the Internet andenvisaged as a key mechanism to provide identifier-locatorseparation for mobile hosts in Future Internet. Such P2P-based systems become increasingly complex serving pop-ular social networking, resource sharing applications, andInternet-scale infrastructures. Hierarchy is a standard mech-anism for coping with heterogeneity and scalability indistributed systems. To address the shortcomings of flatDHT designs, many hierarchical P2P designs have beenproposed over recent years. The last generation is hierarchi-cal DHTs (HDHTs) where nodes are organized onto layersand groups. This article discusses hierarchical architecturesapplied in structured P2P overlay networks, focusing onHDHT designs. We introduce a framework consisting ofconceptual models of network hierarchy, multi-layer hier-archical DHT architectures, principles affecting the designchoices, and cost models for system tradeoff analysis, per-formance evaluation, and scalability estimation. Based onthe framework we provide a taxonomy and survey more than20 hierarchical HDHT proposals.

D. Korzun · A. Gurtov (�)Helsinki Institute for Information Technology HIIT,PO Box 19800, 00076 Aalto, Finlande-mail: [email protected]

D. Korzune-mail: [email protected]

D. KorzunDepartment of Computer Science, Petrozavodsk State University,Petrozavodsk, Russiae-mail: [email protected]

Keywords P2P · DHT · Hierarchy · Heterogeneity ·Scalability · Performance · Survey

1 Introduction

The field of structured P2P systems has seen fast growthupon the introduction of Distributed Hash Tables (DHTs)in the early 2000s. The first proposals, including Chord[1], CAN [2], and Pastry [3], were gradually improved tocope with scalability, locality, and security issues. By uti-lizing resources of end-users, the P2P approach enableshigh performance of data distribution, which is hard toachieve with client-server architectures. The P2P comput-ing community is also being actively utilized for softwareupdates to the Internet, P2PSIP VoIP, video-on-demand,and distributed backups. The recent introduction of theidentifier-locator split proposal for future Internet archi-tectures poses another important application area of struc-tured P2P systems, namely mapping between host perma-nent identity and changing IP address. More discussionand references about the P2P approach, practical impor-tance, real-world challenges, and popular P2P-based appli-cations can be found in surveys [4–12] and in books andbook chapters [13–17].

Designs of distributed Internet-scale systems evolvetowards extremely complicated environments with a mul-titude of heterogeneous and dynamic participants. It is achallenge that has been appearing in ubiquitous computing,Internet of Things, and other paradigms of computing forpresent and future Internet. Such popular web servicesas Amazon (Dynamo), Facebook (Cassandra), LinkedIn(Voldemort) operate using algorithms derived from P2P

mailto:[email protected]



Peer-to-Peer Netw. Appl.

systems. The growing complexity and scale requires theintroduction of hierarchy and more intelligence in routing.In this case, “flat” approaches lack in efficiency. Systemdesigns have to apply various differentiation and decom-position techniques. As a result, hierarchical networkarchitectures for organizing the participants are applied, seegeneral discussion in [18–21].

The efficiency of structured P2P systems is due to main-tenance of rigorous network topology structure—its topol-ogy graph belongs to a class with well-defined invariantproperties of connectivity [5, 7–9, 11, 22–24]. Resources,functionality, and other types of responsibility are uniformlyspread over all the nodes. It ensures low node degree andsmall network diameter, leading to modest node state andhigh routing performance. In this case, however, a P2P nodeknows only a part of the entire network. To approximate thestate of the rest network, the node extrapolates its local statebased on invariant structural properties of the network topol-ogy. Nevertheless, it is not clear which topology is “best”in terms of a tradeoff between routing performance, statecost, churn, robustness to failure, node heterogeneity, andunderlay network properties [22, 24–28].

The heterogeneity can be tackled with differentiation ofnodes and their functions in a P2P system. Some nodes areable to maintain a bigger local state becoming high-degreenodes and reducing the network diameter. On the otherhand, there can be low-capacity nodes that are not so stableand reliable; their state and degree are minimal, decreasingthe network connectivity efficiency. A P2P system designadapts to the heterogeneity by arranging the nodes suchthat they are subject to different responsibilities. Differen-tiation on the local level (individually by a node) leads topre-hierarchical P2P designs [11], each augments a flat P2Pnetwork with many locally-conducted hierarchies.

Differentiation on the global level leads to hierarchicalP2P networks. In such a network, its architecture con-siders an additional kind of entities—groups. Each groupconsists of nodes of similar responsibility. An exampleof group formation is clusters of distant-close nodes, theimportant P2P design principle pointed in [29] and thenfurther actively elaborated [11, 20, 30–32]. Other exam-ples are virtual servers [33–35] when distinct P2P nodesreside at the same machine and semantic clustering [12,36–39] when a group consists of nodes with semanticallyclose content.

Layering is a further conceptualization step—nodes andtheir groups are arranged onto multiple layers [20, 21, 40–42]. The resultant stack of layers is decomposition of thenetwork architecture along the vertical dimension. A layerconsists of “close” groups, and groups of the same layer arefurther structured along the horizontal dimension. Node roleand responsibility depend on both group and layer which thenode belongs to.

Layering supports traditional tree-based hierarchies withthe “divide-and-conquer” decomposition. Leaf vertexes cor-respond to P2P nodes, non-leaf vertexes correspond togroups, and levels in the tree correspond to network archi-tecture layers. A typical example is administrative domainhierarchy, when a node belongs to an organization and theorganization, in turn, belongs to a region [43].

More general group-based hierarchies are also possiblewhen relations between groups on different layers are notpure ancestor-descendant; some groups may be overlappedand even nested. For example, the same node belongs to dif-ferent groups, each group is responsible for its own service,and the groups together define a service pool [37, 44, 45].

This survey continues our study started in [11]. Thelatter showed that local strategies with limited knowledgeper node can provide a high scalability level subject toreasonable performance and security constraints. Althoughthe strategies are local, their efficiency is due to elementsof hierarchical organization, which appear in many DHTdesigns that traditionally are considered as flat ones. In thisarticle, we derive a set of conceptual models to describeglobal hierarchy in a structured P2P network. They coverpossible classes of hierarchies, lead to certain design prin-ciples, and support multi-layer hierarchical architectures.In sum it provides a general framework that unifies thewide spectrum of existing hierarchical P2P designs. It canbe used as a design guideline in new HDHT proposals forlarge-scale P2P systems of recent and future Internet.

The survey consists of five sections, including the intro-duction. Section 2 provides background on structured P2Psystem and introduces the problems of local knowledge andheterogeneity, which in practice become an issue even fortheoretically elegant P2P designs. We describe a genericarrangement approach that allow taking the heterogene-ity into account. Section 3 introduces network hierarchymodels and hierarchical architectures. We propose sixgeneric principles affecting the design choices and derivea classification of hierarchical P2P architectures. Section 4presents taxonomy of most known hierarchical P2P sys-tems. We show realization of the generic design principlesin more than 20 hierarchical DHT proposals. Section 5 con-siders cost models for tradeoff analysis and performanceevaluation of hierarchical P2P architectures. We qualita-tively study the main benefits of hierarchical architectures.Section 6 concludes the survey.

2 Structured P2P networks

This section provides background on structured P2P over-lay networks, including the distributed hash table (DHT).Although many P2P designs are theoretically elegant andefficient in idealized settings, the real-world practice brings


a complex of challenges. The local knowledge problem isone of the core concerns here. An individual node of alarge-scale P2P system cannot know the whole network.The node has to request others when its knowledge isnot enough to determine the target node. The indirectiondegrades performance, security, and other system charac-teristics. Nevertheless, rigorous network structure allowseffective extrapolation of local knowledge, and the net-work reaches a reasonable operation state. We introducebasic arrangement models to support constructing advancednetwork structures in the subsequent sections.

2.1 Mathematical preliminaries

The set of real numbers is denoted R. Its non-negative partis R+. Let f (x) and g(x) be two functions defined on someupper-unbounded subset of R and take values in R+. Wewrite f = o(g) (i.e., f is dominated by g) if for any ε > 0there is x0 ∈ R such that f (x) ≤ εg(x) for all x > x0.Similarly, f = O(g) (i.e., f is bounded above by g, up toconstant factor) if there are C ∈ R+, C �= 0, and x0 ∈ R

such that f (x) ≤ Cg(x) for all x > x0. Notation f = �(g)

(i.e., f grows similarly to g, within constant bounds) meansthat there are C1, C2 ∈ R+, C1 �= 0 and C2 �= 0, and x0 ∈ R

such that C1g(x) ≤ f (x) ≤ C2g(x) for all x > x0.The following notation is opportune in our model

assumptions. Given x, y ∈ R, we write x � y (resp.x � y) if the values of x and y differ significantly by themeaning of the problem domain. A possible formalization isexponent-based: assuming x ≥ 0 and y > 1, then x � y ifx < yα for some 0 < α < 1. Similarly, we use x ≈ y whenthe problem domain provides meaning for the closeness.

2.2 Distributed hash tables

Consider a P2P network of N nodes. Let N also denote theset of them when it does not lead to confusion. Node IDsare assigned from a space S with a distance metric ρ. Awidespread assumption is uniform distribution of nodes inS. The term “node” refers both the node and its ID. Table 1summarizes our basic notation.

A P2P network typically has a form of an overlay net-work over the underlying IP network. Each node u maintainsits local routing table Tu of entries (v, IPv), where v is aneighbor and IPv is its IP address. Nodes adapt their routingtables to an up-to-date state by carefully selecting neigh-bors. The number of neighbors |Tu| is the node degree. Inlarge networks, |Tu| � N is assumed, i.e., local knowledgeabout the network is available at any individual node.

Overlay network topology is modeled with a stronglyconnected directed graph G = (N, L). The set of arcs L isformed by IP-level links (u, v) for all u ∈ N and v ∈ Tu.The local knowledge restriction means that the connectivity

of G is sparse with |L| � N2. Nevertheless, the topologymust guarantee the basic P2P property: v is always reachablefrom u for any u, v ∈ N .

In a structured P2P network, its overlay topology istightly controlled via neighbor selection rules. The reasonis the efficiency of distributed assigning and locating aresource k ∈ D to a node d ∈ N . A popular substrate for thelookup service is DHT; it preserves the efficiency despite ofthe local knowledge restriction. Examples of DHT designsare Chord [1], CAN [2], Pastry [3], Tapestry [46], Kademlia[47], Viceroy [48], Symphony [49], and Broose [50].

Such DHTs are flat. They employ the same space forresource keys and node IDs and assume that the distributionis uniform. This assumption can be achieved with hash-ing h : D → S, even if the resource distribution in Dis semantics-aware or has other forms of advanced struc-ture. Resources are assigned to nodes deterministically: theclosest1 node d becomes responsible for k ∈ S:

d = arg minu∈N

ρ(u, k). (1)

That is, d is responsible for a bucket of keys

S(d) = {k ∈ S | ρ(d, k) < ρ(u, k) ∀u ∈ N, u �= d} , (2)

and the entire space S is partitioned into responsibility zonesS(d) among all nodes d ∈ N . It implements so-called con-sistent hashing, a special kind of hashing, originally devisedby Karger et al. [51].

Lookup for k to its responsible node d starts from anarbitrary node u. Since Tu does not contain all nodes, thelookup needs multi-hop overlay routing from u to d withd unknown beforehand. Each intermediate node w finds anext-hop node v ∈ Tw to forward the lookup, thus formingthe one-hop path w → v in the overlay network. Eventually,an l-hop path u →+ d is constructed for l = |u →+ d| ≥ 1,

u → w1 → w2 → · · · → wl−1 → d. (3)

Some P2P routing algorithms allow several next-hopnodes per lookup. It leads to multi-path routing when nodesduplicate lookup messages. The redundancy aims at therouting dependability. A lookup is successful if at least onepath has succeeded the responsible node.

In contrast to the space distance ρ(u, d), the routing dis-tance τ(u, d) measures the efficiency of (3). For example,τ(u, d) = |u →+ d|. More accurate metrics take intoaccount the sojourn time since one hop in the overlay con-sists of one or more hops in the underlying network. In thiscase, τ(u, d) can be the lookup latency.

1For simplicity we assume in (1) and (2) that there is no u ∈ N , u �= d

such that ρ(d, k) = ρ(u, k).


Table 1 Symbol notation

Notation Description

D Resource key space (application-specific). Resource distribution can be non-uniform and with semantic structure.

S Node ID space. Typically, IDs are numeric (scalar or vector). Let u, v, and w stand for P2P nodes and their IDs.

ρ(u, v) Distance metric in S, which satisfies (i) ρ(u, v) > 0 ∀u, v ∈ S, u �= v, (ii) ρ(u, u) = 0 ∀u ∈ S. The symmetry property and the triangle

inequality are optional.

N The number of alive nodes in overlay. In some contexts, N also denotes the full set of alive nodes.

Tu Routing table (neighbors) of a node u ∈ N . Although it consists of pairs (v, IPv) writing v ∈ Tu does not lead to confusion.

u →+ v Multi-hop overlay path u → w1 → w2 → · · · → wl−1 → v.

τ(u, v) Distance metric in network, e.g., the number of overlay hops or the sum latency of u →+ v in the underlay network.

M The number of levels in network hierarchy. Typically, i = 1 is bottom-most, i = M is top-most.

Ni The sum number of nodes on layer i = 1, . . . ,M (or the sum set of those nodes).

C Group of nodes, a building element of network hierarchy.

T(i)u Routing table of a node u on layer i.

mu The number of layers to which u belongs.

2.3 Local knowledge and network structure

The local knowledge restriction challenges P2P designers.Even assuming the topology graph G is always strongly con-nected, the problem of constructing (3) is non-trivial [26,52]. Unstructured P2P designs apply flooding when eachnode aggressively duplicates lookup messages [7, 53]. Thisnon-scalable solution requires expensive graph traversal.The performance2 is low due to possibility of lengthy pathsin (3). Many intermediate nodes are redundantly loaded.Malicious nodes access many lookups; this vulnerabilityallows harming lookups as well as learning the system.

Let the space distance metric ρ satisfy the property:

If ρ(v, w) ≤ ρ(u, w) then ρ(u, v) ≤ ρ(u, w) ∀u, v, w ∈ S.

It supports progressive routing [11]: u selects a next-hopv such that ρ(u, v) > ρ(v, k). Although d is unknownbeforehand, each hop shortens the space distance to d andρ(d, k) is smallest due to (1). Note that ρ is not requiredto satisfy the symmetry property ρ(u, v) = ρ(v, u) and thetriangle inequality ρ(u, v) ≤ ρ(u, w) + ρ(w, v). Chord [1]and Symphony [49] provide examples of asymmetric andnon-triangle distance.

Progressive routing requires every u to have an appro-priate v ∈ Tu for any destination. Consequently, routingtable maintenance becomes subject to the additional con-straint. Furthermore, a lookup path can be long when itshops shorten the distance with low progress. It leads tolarge l in (3), up to l ≈ N . To prevent this degradation uselects neighbors such that progressive hops are likely avail-able for any destination. In addition, u’s neighbor selection

2Resource replication, large routing tables, and some other methodsexist to improve the performance, but the cost is higher load to nodes,see the discussion in [11] and references therein.

algorithm can be aware of reducing the routing distanceτ(u, ·).

The performance problem for large networks also relatesthe scalability issue. The latter requires the routing distanceto grow slower than the network size, i.e., τ(u, v) = o(N)

for any u and v. Similarly, the local knowledge cannotexpand fast, and the bound |Tu| = o(N) for any u is a typicalrequirement used in P2P designs.

Routing table maintenance adapts the network topologyto dynamic changes. Nodes join and leave the network dur-ing its lifetime—the churn problem [28]. With the proactivestrategy a node periodically calls maintenance proceduresin background. For instance, every neighbor is checked reg-ularly or candidates to replace failed ones are collected inadvance. The reactive strategy allows lazier algorithms; amaintenance procedure is called when the node discoversa network fault. For instance, a repair procedure starts forthe neighbor that has stopped serving lookups correctly or itexplicitly notifies about its leaving.

In a structured P2P network, routing table maintenancefollows strict rules common for all nodes. The rules aim atpreserving the rigorous network structure (a class of topol-ogy graphs permissible by the design). Using the networkstructure a node u extrapolates its local knowledge for rout-ing. In particular, u selects the next hop estimating the costof the rest of the path. The network structure does not pro-vide u with concrete paths; instead, u knows that for a givenkey good paths are available beyond its neighbors.

Most P2P designs today assume a certain structure [7,11]. Topology graphs are of low node degrees and everypair (u, v) is connected by many short paths [23, 52]. FlatDHT designs provide instances of very structured systems:a ring in Chord [1], multidimensional torus in CAN [2], pre-fix trees in Pastry [3] and Tapestry [46], butterfly networkin Viceroy [48], and De-Bruijn network in Broose [50].


Neighbor selection rules in flat DHTs are relatively sim-ple. They are based on space distance ρ. Some designsare also aware of underlying network topology and employadditionally routing distance τ . In semantic-based P2P sys-tems, the resource distribution in D is essential. For exam-ple, nodes with thematically close resources become neigh-bors. Routing table maintenance becomes more compli-cated and takes into account semantic relations between theresource sets D(u) and D(v) stored at u and v, respectively.

Maintenance of Tu can involve additional aspects, whichare not directly related to the routing efficiency. Openenvironments assume diversity and heterogeneity of par-ticipants, including unreliable, selfish, and even maliciousnodes. In this case, the neighbor selection rules may allowfor u to decide a neighbor among several candidates. In par-ticular, u improves routing dependability and security whenthe neighbor selection takes into account reputation ofeach candidate, its trust and reliability estimates, and theirextrapolation to the paths beyond the candidate [24, 25, 54].

2.4 Arrangement models

There are three simple arrangement models that are applica-ble in P2P systems for taking the heterogeneity of nodes intoaccount [11]. Let X be a set that represents some knowl-edge about the system. For example, X = N is the set ofall nodes in the system or X = Nu consists of all neighborsof some u ∈ N . In general case, u’s knowledge of X maydiffer from the knowledge of other nodes v �= u.

• Ordering: A node u uses a binary relation ≺ such thatfor any x, y ∈ X either x ≺ y, y ≺ x or x = y. Inother words, u can arrange elements of X in accor-dance with some “preference”. The following twomodels are extensions (continuous and discrete) of theordering model.

• Ranking: There is a rank function r : X → R, and u

computes a real numerical value r(x) for each x ∈ X .Thus, elements of X are ordered on the real line R.The important additional information is the value|r(x) − r(y)|, which is the preference level for u tocompare x and y.

• Classifying: The elements of X are categorized intogroups or levels i = 1, 2, . . . , M according to the pref-erence. Although this model has less precision than theranking model the former allows tradeoffs between thecomplexity and accuracy.

In [11] we have shown the application of these mod-els to flat DHTs. Each node arranges locally its knowledgeabout the global network. In particular, path construction (3)follows a hierarchical scheme. In this survey we applythe arrangement models for describing global hierarchicalstructures in P2P networks.

3 Conceptual models

This section introduces conceptual models that cover pos-sible hierarchies in decentralized networks. The models aresupported with a set of design principles for constructinghierarchical P2P architectures. It provides a general frame-work that unifies the wide spectrum of existing HDHTdesigns with hierarchical multi-layer architectures.

3.1 Network hierarchy models

Fundamental work of Kleinberg [26] considered three con-ceptual models for decentralized networks: grids, hierar-chies, and set systems. Kleinberg aimed at network mod-els for analysis of decentralized search algorithms. Thearrangement approach allows modification of the mod-els to cluster-based hierarchies, tree-based hierarchies, andgroup-based hierarchies, respectively. We apply Kleinberg’sresults to describe system-level hierarchies appropriate forstructured P2P networks. Each model defines a design prin-ciple, which makes understanding of advanced networkstructure construction with the local knowledge restriction.

3.1.1 Cluster-based model

The small-world phenomenon—connectivity by shortchains of acquaintances—has become the subject of com-puter and social network analysis [52, 55], including P2Pnetworks [5, 49]. Analytical survey [11] showed that manyconventional DHT designs follow small-world models inconstructing local routing tables. It results in geometricallyprogressive routing when the hop length |u →+ v| isexponentially shorter than the space distance ρ(u, v).

The basic scheme of neighbor selection states that a nodeu divides its neighbors onto local and long-range ones (clas-sifying). Local neighbors of u are equal in terms of the spacedistance. Long-range neighbors are structured in u’s vicinity(ordering). The space distance metric provides ranks ρ(u, v)

for u to arrange other nodes v (ranking).Kleinberg’s grid-based model is a simple example [26,

52]. The network is embedded in a two-dimensional n ×n grid graph, defining all local neighbors for all nodes.Then a long-range link is established between nodes uand v with probability proportional to [ρ(u, v)]−α , whereα ≥ 0 is a model parameter. Symphony [49] is the firstDHT design that adapted this model for practical P2Psettings.

The idea of distance has further evolved in DHT designs.Instead of the space metric ρ, composite distance met-rics, such as routing distance τ , can be applied to reflectproximity of the underlying IP network, semantic close-ness of resources kept at nodes, node sevice reputation,etc. The small-world phenomenon is then interpreted [26]


as embedding a network into an underlying space withdistance τ . Nodes tend to know their close neighbors in thisspace as well as to have contacts that span long distances.Establishing a link with a closer node is more likely.

This distance-aware selection leads to hierarchical struc-tures in the network as was shown in [11]. Clusteringmethod evolves the distance idea further. The method statesthat the routing distance τ must positively correlate withthe space distance ρ. Consequently, given two groupsof nodes

Cxr = {u | ρ(x, u) ≤ r}, CyR = {u | ρ(y, u) ≤ R}for some landmarks x, y ∈ S and positive scalars r < R.Then intra-group routing in Cxr is more efficient than inCyR . In other words, taking arbitrary u, v ∈ Cxr and u′, v′ ∈CyR we expect that likely τ(u, v) < τ(u′, v′).

Applying the classifying model results in P2P clusters.Close nodes become densely connected and form a discreteentity—the cluster with explicit space bounds, see Fig. 1for intuition. Within a cluster each node can reach anotherin few hops. Therefore, a global structure of interconnectedgroups of close nodes appears, forming a two-layer hierar-chy. In intra-cluster communication the routing distance τ

is shorter than in inter-cluster communication.P2P designs that follow the clustering principle were con-

sidered in [8, 11, 21]. They are called “pre-hierarchical” in[11], “hybrid” in [8], or “planar group-based” in [21]. Initialdesigns have no mechanism for a cluster to be a decision-making entity, and nodes benefit only from the topologicalproperties of cluster-based network structure. In this surveywe focus on the designs where each group of nodes becomesa peer entity that can make own distinct decisions andactions in the network. We call such designs and their fur-ther generalization hierarchical; they are also known [21] as“partially centralized”, “hybrid”, “layered”, or “multi-tier”.

Principle 1 (Cluster-based model) The hierarchy is two-layer with participant nodes on the bottom layer and clusters

Fig. 1 The network with two clusters, where each cluster defines acommunity of 256 nodes. Nodes are densely connected within theircommunity, in contrast to the poor inter-community connectivity. As aresult, ρ(u, v) is smaller within a cluster compared to the case when uand v belong to different clusters. The picture is from [56]

on the top layer. Nodes of the same cluster are denselyconnected; the inter-cluster connectivity is sparser.

The model defines a hierarchy as a set of n inter-connected clusters. It embeds the network such that itsnetwork structure ensures the following routing distanceproperty.({Cs}ns=1, τ

),

τ (u, v) � τ(u, w) for u, v ∈ C , w ∈ C ′, C �= C ′. (4)

Each particular model defines its concretization of “�”.It must support the intuition that clusters are disjointed orlow-overlapped, which is a consequence of the classifica-tion model. When clusters are essentially intersected thenthe intuition behind “�” suffers, see two examples in Fig. 2.

Note that (4) captures the crucial role of hierarchyfor the P2P routing dependability, e.g., see [57]. Whenu, v ∈ C (i.e., they are within one domain) then even if anode w of another domain C ′ is failed or if C is discon-nected from the system, the nodes u and v are still able tocommunicate.

A simple formal model of cluster construction is a set ofballs in S, e.g., in grid [26]. There are landmarks {cs}ns=1;they are points or dedicated nodes in S. Cluster Cs con-sists of all nodes u that satisfy ρ(cs, u) ≤ R for a givenradius R. The construction can be easily enhanced to con-struct disjointed clusters; if u belongs to several balls then adecision-making procedure assigns u in exactly to one clus-ter. In fact, the decision is about with which cs to associatea node.

3.1.2 Tree-based model

The clustering principle allows constructing global two-layer hierarchies based on a distance metric. Anotherapproach is to define a hierarchy in S explicitly. In turn,the hierarchy induces space distance ρ in terms of thehierarchy tree. Then the hierarchy embeds the network,forming the hierarchy-aware connectivity structure withthe resultant routing distance τ . This abstraction arisesfrom classical hierarchical methods, also known as “divideand conquer”.

C1 C2C1

C2

(a) (b)

uw vw

v

u

Fig. 2 When clusters C1 and C2 are not disjointed then there can benodes u, v and w for which the distance relations contradict with (4):(a) partially overlapped clusters, (b) nested clusters


For example, in location and administrative hierarchiesnodes are categorized on lowest-level groups depending onwhich local area network they belong to. The upper levelgroups are defined in accordance with their scale, e.g., city,region, and state. Figure 3 depicts a fragment of the univer-sity hierarchy where participant nodes belong to differentadministrative entities.

Kleinberg [26] described a formal network model basedon a complete b-ary tree T = T (b, N) with N leaves(hence T is of height M = logb N ). Given leaves u andv, the hierarchy-induced distance h(u, v) is the height oftheir lowest common ancestor in T . A network of N nodesis constructed such that the probability of establishing alink u → v is proportional to b−αh(u,v), where α ≥ 0 isa model parameter. As in the cluster-based model, shorterpaths (routing distance τ ) exist between nodes of the samegroup.

In general, an arbitrary tree T defines a hierarchy whereN leaves correspond to nodes and other vertices are groupsconsisting of descendant groups and nodes. The distancemetric is the tree distance as in the above model. In fact,it introduces additional layers to two-layer cluster-basedmodel (4); groups (non-leaf vertices) correspond to clusters,some of them are nested Ci ⊂ Cj in accordance with thepredefined hierarchy.

Principle 2 (Tree-based model) The hierarchy is M-layer.The bottom layer i = 1 consists of N nodes. On the upperlayer i + 1 each group consists of all nodes from its descen-dant groups on the layer i. Groups of the same layer arenode disjointed. On each layer the inter-group connectivityis sparser than the intra-group connectivity.

The tree-based model can be thought as iterative appli-cation of the cluster-based model. Each iteration results inthe next layer, reflecting a higher scale level. Layer i + 1consists of “clusters” for layer i. If {Cij }mj=1 are all groupsof a fixed layer i then they satisfy (4). The model pre-serves the property of dense node connectivity within agroup compared with the connectivity to nodes of othergroups.

Faculty of Science

Networking

CS EEMath

ComputationalAlgorithms Intelligence

Artificial. . . . . .

. . .

. . .

Fig. 3 A hierarchy of nodes at a university. Circles stand for P2Pnodes. Each faculty consists of several departments. Each departmenthas research groups with own nodes

Further generalization is that there can be several distincttrees {Tk}mk=1, reflecting that the network simultaneouslytakes into account several “proximity” characteristics. Forexample, nodes in Fig. 3 can be distributed over many geo-graphically different locations, and we should consider anadditional area-location tree. Each Tk defines its own dis-tance metric ρk . The resultant connectivity with routingdistance τ must follow (4) on any layer of any Tk . Theset {Tk}mk=1 defines m superlayers in the hierarchy.

3.1.3 Group-based model

Each of the above two models defines specific rules of groupformation. Kleinberg et al. [26, 53] considered generaliza-tion where groups of the same network may be formed bymeans of different models, including at least cluster- andtree-based ones. Each node can belong to several groups.The rules of group formation are arbitrary. As in the cluster-and tree-based models, nodes are likely to be connected ifthey belong to the same group.

For example, a node u can belong to group C1 (HelsinkiInstitute for Information Technology—administrativeentity), group C2 (powerful machines—performance level)and group C3 (Europe—location).

Kleinberg’s group-based network model establishes thefollowing properties using parameters 0 < λ < 1 and 1 <

μ.

(i) The full set N of all nodes is a group.(ii) If C is a group of size |C | ≥ 2 and u ∈ C , then there

is a group C ′ ⊂ C such that

C ′ �= C , u ∈ C ′, λ|C | ≤ |C ′| < |C |.

(iii) For any set of groups {Ci} with a common node u,

∣∣⋃

i

Ci

∣∣ ≤ μσ, where σ = maxi

|Ci |.

Property (i) ensures that for any subset of nodes there isa group such that it includes the whole subset. Property (ii)is a type of the “hierarchy balance” requirement when agroup consists of subgroups of proportional size. For exam-ple, the tree-based model with a complete b-ary tree definesgroups of b node-disjoint subgroups each, hence λ ∼ 1/b.Property (iii) is a type of “bounded size growth” require-ment; if groups have a common node then they are close incertain sense, so they cannot contain many distinct nodes.For example, the cluster-based model forms node-disjointgroups, hence a set in property (iii) always consists of onegroup.

For two nodes u and v, the induced space distance ρ(u, v)

is the minimum size of a group containing both u and v.Similarly to the previous models, it allows embedding the


network into the hierarchy such that the connectivity struc-ture positively correlates the routing distance τ with ρ. InKleinberg’s network model, the probability of establishinga link u → v is proportional to [ρ(u, v)]−α for α ≥ 0.

Properties (ii) and (iii) motivate explicit separation of twodimensions in network hierarchy: vertical and horizontal.The vertical dimension defines layers and rules for nestinggroups. Any hierarchy contains at least one chain of nestedgroups for any node u:

u ∈ C1 ⊂ C2 ⊂ . . . ⊂ Cm = N, (5)

where each Ci belongs to a distinct layer and there is no Csuch that Ci ⊂ C ⊂ Ci+1. Hence, the tree-based model ispure vertical, with the only chain of nested groups in (5).

In general, there can be several chains. The followingrequirement preserves the vertical structure “approximatelynested”. If {Ci} are arbitrary groups having a common nodeand any two of them do not belong to the same layer then∣∣⋃

i

Ci

∣∣ ≤ μ(maxi

|Ci |), (6)

where μ(·) is a monotone increasing function, defined byeach particular model.

The horizontal dimension defines classification ontogroups on the same layer i.⋃

j

Cij = Ni, |Cij ∩ Cik| � |Cij�Cik| ∀j �= k, (7)

where � is the symmetric difference. In this sense, thecluster-based model is pure horizontal; its group distributionis always a partition of N, thus Cij ∩ Cik = ∅.

In the general case, groups on the same layer may over-lap, e.g., different research groups share some nodes asshown in Fig. 3. Requirement (7) preserves groups “lowoverlapped”, where each particular model defines its con-cretization of the relation “�”. Note that alternative chainsin (5) can appear because of overlapping.

Node’s neighborhoods on layer i are an example of over-lapping groups Ciu = {u} ∪ T

(i)u , where T

(i)u is u’s routing

table on layer i. If groups Ciu and Civ are high overlappedthen, from the interconnectivity point of view, they can bereplaced with the cluster Ci,uv = Ciu ∪ Civ .

The following principle summarizes the unified concep-tual view on hierarchy construction.

Principle 3 (Group-based model) A hierarchy is a set ofinterconnected groups (4). It spans both vertical and hori-zontal dimensions satisfying (5), (6), and (7).

Lloret et al. [21] introduced several group-based topolo-gies for centralized and decentralized networks. That workis focused on the topology graph and its role in the network

efficiency improvement. They considered the case of dis-jointed groups, a particular instance of the general model ofPrinciple 3. An essential finding they did is the importanceof layering for scalable network structures. The optimiza-tion techniques for group formation algorithms and groupdistribution among network layers are discussed in [20].

3.2 Layering principle

Let us discuss the layering principle, which can be com-bined with the conceptual models to form vertical andhorizontal dimensions in network structures. Although pre-hierarchical P2P designs utilize the clustering principle [11],an individual node does not necessarily identify itself amember of a certain global group as well as a group doesnot act as a decision-making entity in the network.

The clustering principle fixes no global group set a priori.Each node u is able to compute the metric ρ, which numer-ically reflects the global network hierarchy. Values ρ(u, ·)control u’s connectivity to the network. Other nodes applythe same rules. As a result, clusters appear at the globalnetwork level, approximating the given hierarchy in actualoverlay network topology.

Pre-hierarchical P2P designs follow the entire overlayagreement approach of flat P2P. All participants must glob-ally agree on a set of protocols and parameter settings likethe routing rules, size of routing tables, synchronizationintervals, and replication strategy [43]. Optimal settings forthese parameters depend on dynamic factors like churn rate,node failure probabilities, and fault correlation. They can bedifficult to assess or estimate for a large overlay.

In the hierarchical approach, the overlay is a prioripartitioned onto discrete layers, so forming the verticaldimension of network hierarchy. Many problems can besolved within a layer. The problem size and complexityare reduced compared with the entire overlay agreementapproach.

In contrast to Kleinberg’s hierarchy models, we empha-size the importance of vertical dimension. This requirementdirectly comes from practice; multi-layer architecture isa distinct characteristic property of any hierarchical P2Pdesign, including HDHT designs [9, 42].

Principle 4 (Layering) There is a priori partitioning ontofixed discrete layers. Each node u identifies itself with aconcrete layer when u performs its functions in the P2Psystem.

The layering principle benefits from simplicity in therelation between the global network hierarchy and the actualoverlay topology. A network layer can correspond to adomain in the global hierarchy. The maintenance cost isreduced by distributing the load among levels. Protocol and


parameter settings are done within each domain. The princi-ple first appeared in unstructured P2P systems [29, 40], thenit was adopted in structured P2P designs.

A simple hierarchy is ordered two-layer (basic supern-ode model): the top layer of supernodes and the bottomlayer of networks of regular nodes. This hierarchy is aninstance of the cluster-based model, see Principle 1. Eachsupernode is assigned to a network of regular nodes andconnects this network by proxying lookups on behalf of itsregular nodes. Supernodes are more available and power-ful overlay nodes. For regular nodes the hierarchy reducesthe impact of their short online times on the P2P system.Figure 4 shows an example where nodes form clusters onthe bottom layer; then each cluster maintains its own overlayand selects its supernodes to represent the cluster in the toplayer overlay.

The layering principle is applicable for arbitrary finitenumber of layers to provide efficient scaling and orga-nization. For instance, M ≥ 2 layers can be used toenable distinction between nodes with different capabilities,domains, and other global characteristics. The result is ahierarchy with multiple layers ordered among the verticaldimension.

Similarly to clustering in pre-hierarchical P2P designs,a multi-layer hierarchy influences the network topology,but the impact is higher. Nodes of the same domain formown network. Hybrid P2P systems are possible [8, 58,59] that combine unstructured and structured topologiessince a network on each layer runs own P2P protocol. The

i = 2

B

i = 1

C1

C2

C3

C4

C5

C

D

E

F

G

H

AB

D

E

H

A

C

F

G

Fig. 4 Example of ordered two-layer architecture. Nodes are orga-nized into clusters C1, . . . , C5 on the bottom layer. They appointsupernodes A, B, . . . , H to form own overlay on the top layer. Each A,B, . . . , H is physically one node acting as two virtual nodes, one perlayer

anonymity is also improved (compared with flat designs);a network on a given layer is a black box for an externaloverlay node.

Layering assumes that a supernode can participate in anyoverlay. The assumption is not always true in IP networkssince they have connectivity restrictions, e.g., due to NATsor firewalls.

3.3 Network hierarchical architectures

The layering principle introduces a stack of interconnectedlayers, structuring the network vertically. The horizontaldimension is for the intra-layer connection structure. A con-crete design with composition of the vertical and horizon-tal dimensions defines a hierarchical architecture. Artigaset al. [60] distiguished vertical and horizontal approachesfor hierarchical architectures. The original definition isfocused on inter-overlay connection structure—a result ofmultiple routing tables that nodes maintain to participate inseveral overlays. We consider another criterion accentingthe role of layering and the relation between layers and theiroverlays. A nodes group defines a role of group’s nodes,and layers and overlays are instances of the notion “group”.Group-aware structures can be constructed applying twoprinciples: disjoining and nesting.

3.3.1 Vertical approach

The characteristic property is that every layer is a self-contained P2P overlay network. Vertical architecture is a set(ordered or not) of M overlays. Each layer-i overlay followsits own protocol. Layers may consist of different numberof nodes, Ni ≤ N . Any supernode u participates in severaloverlays, i.e., u is a gateway connecting them. Thus u has tomaintain multiple routing tables,

u ∈ Ni ∩ Nj, T (i)u ∪ T

(j)u ⊂ Tu for some i �= j .

In routing, u decides which of its overlays to use for alookup. A regular node participates in one overlay and for-wards lookups to supernodes for inter-layer (global) routing.

Intuition behind vertical architecture is that the systempartitioning is along the vertical dimension only. Withineach layer one overlay network is presented, thus thereis no specific horizontal partitioning. The division ontosupernodes and regular nodes is for vertical glue betweenlayers.

Vertical architecture benefits of the simplicity andstraightforward implementation. For example, the orderedtwo-layer architecture is similar to shown in Fig. 4 earlier.All nodes participate in the bottom layer (N1 = N ), form-ing one big overlay network (e.g., flat DHT). The mostresponsible nodes become supernodes (N2 ≤ N ), forming


an additional overlay on the top layer (e.g., another flatDHT).

Increasing the number of layers provides more designopportunities, see Fig. 5 for illustration. Supernodes inFig. 5a have two or three independent routing tables: F con-nects layers 1 and 2; C and G connect layers 1 and 3; A,D, H connect layers 2 and 3; B and E connect all threelayers. When layers use essentially different P2P protocols(e.g., a mix of structured and unstructured protocols) thehierarchical architecture is hybrid.

In terms of the group-based model (Principle 3), eachlayer i defines own group Ni of nodes. The non-emptypair-wise intersection is due to supernodes,

N =M⋃

i=1

Ni, Ni ∩ Nj = Sij �= ∅ for i �= j,

where Sij consists of all supernodes participating in theoverlays of layers i and j.

There are several alternatives for a supernode in (5).Requirement (6) can be satisfied, e.g., with the moder-ate number of layers M � N , then either approximatelyequal-sized layers or few ones that cover the majority ofnodes.

Layers in vertical architecture can be ordered or non-ordered. Ordered architecture applies layering to defineclasses of the node responsibility or functional role in thesystem. For instance, in the two-layer supernode model (seeFig. 4) the supernode overlay assures more efficient rout-ing than the bottom overlay. A typical case is the decreasingnumber of nodes for the higher responsibility:

N1 > N2 > · · · > NM, (8)

i = 1

i = 3

i = 2

A

A

B

B

B

C

C

D

D

E

E

E

F

F

G

G

H

H

i = 1

i = 3

i = 2

A

A

B

B

B

C

C

D

D

E

E

E

F

F

H

H

A

C

D

G

HG

G

)deredro-non(lacitrevylluf(b))deredro-non(lacitrev(a)

i = 1

i = 3

i = 2

A

A

B

B

B

C

C

D

D

E

E

E

F

F

H

H

A

C

D

G

HG

G

i = 1

i = 2

i = 3

lacitrevyllaitrap(d)ralugerhtiwlacitrevylluf(c))lacitreverup,deredro()deredro-non(sedon

Fig. 5 Vertical architecture for M = 3 layers. A regular node (unfilledcircle) belongs to exactly one layer. A supernode (filled circle) con-nects two or more layers. Linked circles represent one physical nodeacting as several virtual nodes due to layering. a Supernodes connectany combination of layers. b Any node is a supernode for every layer.

c Regular node appears on one layer; supernode appears on all layers.d Tree- or group- based hierarchy; grey ovals are clusters of layer ioverlay, which provides the P2P connectivity for all its nodes in spiteof their clustering structure


which is natural for nesting requirement (5). The pictorialidea is shown in Fig. 5d.

Non-ordered architecture is suitable for federated P2Psystems, when several domain-independent overlays arecombined. It is illustrated in Fig. 5a, where a layer mightrepresent a distinct administrative domain. A few supern-odes belong to multiple domains. Layers are equal in theirparticipation in the system. The design must define rulesfor a supernode which layers it is associated with. Nest-ing requirement (5) becomes minor due to its reduction tou ∈ Ni ⊂ N (for some i). In certain sense, non-orderedvertical architecture is close to two-tier horizontal architec-ture (Section 3.3.2): the bottom tier of regular nodes and thetop tier of supernodes. Nevertheless, this architecture is ver-tical, and the distinctive property is that supernodes do notorganize themselves into own overlay or a set of overlays.

The vertical approach allows multi-hierarchical archi-tectures. Since every layer is a self-contained P2P overlay,the same procedure can be further applied to any layerto construct its own hierarchical architecture. For instancefor multi-tree hierarchy (Section 3.1.2), each tree Ti corre-sponds to layer i, where the hierarchy is applied to all nodesN = Ni and any node is a supernode for all M layers, seeFig. 5b. For federated architecture this recursive approachleaves much design freedom for the overlays that participatein the federation. Note that multi-hierarchy may combinelayers with ordered and non-ordered architectures.

Ou et al. [42] further divided the vertical architectureclass into fully vertical and partially vertical. In fully ver-tical architecture all N nodes appear on all M layers, asillustrated in Fig. 5b. Therefore, there is no need in ded-icated gateway nodes. Since each layer is a full N-nodeoverlay, lookup can be successively routed within any layer.Nevertheless, for the routing efficiency a node decides anappropriate layer i to bring into play for a given lookup.

Fully vertical architecture can be extended with regularnodes, as shown in Fig. 5c. Each supernode still must par-ticipate on all M layers, hence being a “universal gateway”.Each regular node is presented within its domain (layer)only. In practice it can be because the domain security pol-icy does not allow such a node to be accessible directly fromother domains, node capacity is low for maintaining mul-tiple routing tables, or NATs and other IP-level restrictionsprevent direct connectivity.

We use the term pure vertical for the partially verticalarchitecture since it inherits the original two-layer supern-ode model. The idea is depicted in Fig. 5d. The order oflayers is essential. A node u participates in the overlays onlayers i = 1, 2, . . . , mu, thus u does not appear on layersi > mu. This solution is appropriate for highly heteroge-neous environments. For example, a mobile device, whichcannot maintain many routing tables, is presented only onthe bottom layer.

The obvious merit of pure vertical architecture is that itnaturally admits the cluster-based and tree-based hierarchymodels with iterative layer construction. Nodes of the layer-i overlay are organized into clusters Cik . A cluster assignsone or more its nodes to be supernodes for layer i + 1. If allsupernodes of Cik belong exactly to one cluster Ci+1,j onthe next layer, then the hierarchy is tree-based. If supernodesof Cik participate in different clusters on layer i+1, then thehierarchy is group-based. Moreover, some nodes on layer imay ignore clustering and supernode assignment so far asthe layer-i overlay preserves its connectivity.

The known drawback of straightforward designs of ver-tical architecture is the overhead that supernodes afford intheir routing table maintenance. The number of networkconnections can reach a big value for large N and M, wors-ening the scalability. For instance consider a network whereits layer-i overlay consists of Ni = N/M nodes and usesa flat DHT like Chord. A node has a routing table T (i)

of log(N/M) entries. Since a supernode belongs to m layersit keeps

∑mi=1 |T (i)| = m log(N/M) entries. If the N-node

network would be implemented with a single flat DHT thenevery node has routing table of size log N . Let concretiza-tion of M � N be Mα < N for α > 1. Then the ratio

m

α<

m log(N/M)

log N< m (9)

shows that the state overhead of a supernode is proportionalto the number of layers the node participates in.

Note that this high overhead happens when the supern-ode maintains its routing tables independently. Contraryto the widespread opinion, vertical architecture does notalways prevent lower overhead maintenance when rout-ing entries are effectively reusable on different layers, asit first appeared in horizontal architecture designs (seeSection 3.3.2). In networks with low-overlapped layers,like federated P2P networks, the routing table maintenance,however, cannot efficiently benefit from the routing linksreusability.

3.3.2 Horizontal approach

The characteristic property is that, in addition to layeringalong the vertical dimension, every layer is divided ontoseveral disjointed overlays, see Fig. 6. In this certain sensevertical architecture is a particular case of horizontal archi-tecture with one overlay per layer. The term “horizontal”supports the intuition that the overlays of each layer spreadout the horizontal dimension. In fact, partitioning a layerinto overlays can be treated the application of the layeringprinciple for the horizontal dimension.

As in vertical architecture, the horizontal approachallows non-ordered variants, see Fig. 6a. They are, how-ever, not popular in P2P designs since the maintenance of


i = 1

i = 3

i = 2C21 C22

C31C32 C33

C11C12

C13C14

A

B

B

A

A

C

C D

D F

F G

G

G

H

H

I

I

C12

C13

C14C15

C16

C17

C18

C19

C11

23

C21

22

C

C3C

(a) horizontal (non-ordered, Ni ≤ N) ,deredro(latnoziroherup(b) Ni = N)

Fig. 6 Horizontal architecture for M = 3 layers. a An inter-layerconnected combination of overlays; there is no intra-layer connectivitybetween overlays. b Pure horizontal. Bottom layer i = 1 consists of

the smallest overlays C11, C12, . . . , C19. Layer i = 2 merges them tothe medium-sized overlays C21 = C11 + C12, C22 = C13 + · · · + C16,C23 = C17 + C18 + C19. Top-most overlay C3 includes all N nodes

the inter-layer connectivity between all overlays is compli-cated. Another property of the horizontal approach is thatany layer i can cover all N nodes in sum (Ni = N ), whichis similar to fully vertical architecture.

Consider the pure variant of horizontal architecture:ordered stack of layers with Ni = N . In contrast to pure ver-tical architecture (Fig. 5d), all N nodes are partitioned intomany small overlays on the bottom layer i = 1 (Fig. 6b).The number of overlays is then reduced on every next layeri + 1 by merging layer i overlays, while keeping the samesum number of nodes on the layer. Finally, a single globaloverlay of N nodes appears on the top layer i = M .

This construction directly reflects the tree-based hierar-chy (Principle 2), thus the pure horizontal architecture isappropriate for systems with explicit domain hierarchy, likeshown in Fig. 3. Moreover, it allows keeping moderate thesum routing table size. Instead of maintaining independentrouting tables, a node u reuses its routing table entries inoverlays on upper layers, e.g., with the telescoping scheme:

T (1)u ⊂ . . . ⊂ T (i)

u ⊂ T (i+1)u ⊂ . . . ⊂ T (M). (10)

The routing table size grows up with i = 1, . . . , M .Any layer i + 1 overlay inherits and then extends the linkstructure of its layer i overlays. Additional links ui =T

(i+1)u \ T

(i)u are chosen such that the size of every routing

table remains comparable with a flat DHT, e.g., |T (i)u | =

O(log N).Further generalization allows partial inheritance, i.e.,

some entries of T(i)u are not in T

(i+1)u . For instance, if layers

have unequal sets of nodes (Ni �= Nj ), then some neigh-

bors from T(i)u are absent on layer i + 1, thus they cannot be

in T(i+1)u . The reuse requirement is reduced to keeping the

symmetric difference T(i)u �T

(i+1)u small. This way is also

appropriate for non-ordered horizontal architectures archi-tecture, where T

(i)u �T

(j)u should be kept small for i �= j .

Contrary to the widely accepted opinion the routing entryreusability is not a distinct property of horizontal architec-ture. For instance, full vertical architecture designs can alsobenefit from supernodes that rationally maintain routingtables in the reusable manner. Actually, the reuse require-ment is an instance of generic group-based principle (6)with groups Ciu = {u} ∪ T

(i)u , leading high overlapped

node’s neighborhoods on different layers.Since the requirement Ni = N or at least Ni ≈ N is

typical in horizontal architecture, the already mentioneddisadvantage is that IP connectivity restrictions or highresponsibility requirements can prevent participation of cer-tain nodes in some layers. A more specific disadvantage isthe unclear distinction among the load that different nodestake in the system. Node differentiation becomes difficultwhen almost every node participates in all layers.

One solution to mitigate the above disadvantages can bebrought from pure vertical architecture. Each layer states thenode responsibility level. A node u maintains its (nested)routing tables up to layer mu < M , cf. (10). Communicationwith layers mu < i ≤ M is through those nodes (supern-odes for u) that maintain routing tables up to layer i at least.Hence, the node distribution among layers satisfies (8).

3.3.3 Disjoining and nesting principles

The group-based model describes the essence of any hier-archical architecture. A concrete architecture design fur-ther clarifies which relations exist among the groups, e.g.,some groups are disjointed and some ones are nested. Sev-eral instances of such relations were considered above.


Their generalization is two design principles formulated inTable 2.

Each principle captures an ideal case. Concrete hierarchi-cal architecture designs are always subject to tradeoffs. Theterms low- and high-overlapped groups show the possibilityof deviation from the ideal cases. In fact, for a given archi-tecture we can only say about the tendency: either it followscloser the disjoining principle or the nesting one.

Both principles state that a hierarchical architecturedesign reflects the node role differentiation and similarity bythe construction of groups. The principles are opposite sincethey result in decreasing and increasing the group overlap,respectively. Nevertheless, they are applicable in a composi-tion for both vertical and horizontal dimensions. They pro-vide more understanding of the propositions we introducedin the group-based model and the layering principle.

The layering principle can be considered as an instance ofthe disjoining principle for constructing the vertical dimen-sion. Along this dimension the groups can also follow thenesting principle and form a nested structure as it wasformalized earlier in (5). Moreover, if groups from different

layers are overlapped then their intersection is essential andthe sum size is bounded with (6), an evidence of the nestedprinciple. The horizontal architecture approach utilizes dis-joining principle: all nodes of the same layer are partitionedinto independent overlays, defining groups that follow (7).

The disjoining and nesting principles allow introducinganother classification for hierarchical architectures onto dis-jointed and nested. This classification reflects the use of ver-tical stack of layers to capture the node responsibility level.The classification criterion is combinative; a given architec-ture cannot be pure disjointed or nested. Instead, it shouldbe considered rather disjointed then nested and vice versa.

Disjointed architecture is based on conceptual separa-tion of groups of different layers. Inter-layer overlappingis kept low. A node participating in several layers per-forms different roles on each layer. The node responsibilityis primarily measured with the number 1 ≤ m ≤ M

of layers that the node connects (m-responsible node).Regular nodes are least responsible. The population size ofm-responsible nodes decreases rapidly with growing m (e.g.,exponentially).

Table 2 Disjoining and nesting principles for hierarchical P2P architecture designs

Principle 5 (disjoining) Nodes with different roles belong to Principle 6 (nesting) Nodes with similar roles belong to

disjointed or low-overlapped groups. nested or high-overlapped groups.

Properties

Classifying model: groups are separated in accordance with their Ordering model: higher layer groups inherit partially the node

node population, responsibility, or functional role. population and increment the responsibility and functional role.

Low-overlapped groups: If u, v ∈ G and u /∈ G ′ then likely v /∈ G ′. High-overlapped groups: If u, v ∈ G and u ∈ G ′ then likely v ∈ G ′.This tendency correlates with (7). This tendency correlates with (6).

Coarse responsibility granularity: The number of m-responsible Fine responsibility granularity: The number of m-responsible nodes

nodes decreases rapidly with m = 1, 2, . . . ,M . The number of decreases slowly with m = 1, 2, . . . ,M . The number of layers

layers M is typically small, hence providing few responsibility M is can be big, hence allowing many responsibility levels and

levels. Low-responsible nodes prevail, otherwise the hierarchy non-trivial nested structure (5). There are many high-responsible

degenerates to a flat structure. nodes, up to the totally supernode population.

Typical evidence in architectural design

Layering

Layers represent distinct domains; the overlap is low due to a small Every next layer increments the node responsibility. An mu-

set of supernodes (compared with the major population). Popular responsible node u participates in layers i = 1, 2, . . . ,mu ≤ M .

in non-ordered architectures. Popular in ordered architectures.

Clustering

Construction of the horizontal dimension. Each layer is composed Construction of the vertical dimension. Some nodes of each cluster

of disjointed or low-overlapped groups by clustering similar nodes. also appear on upper layers.

Multiple routing tables

A node maintains layer-independent routing tables, leaving freedom A node maintains nested routing tables, allowing reuse of routing

for the intra-layer connectivity. Typical for vertical architecture. entries on different layers. Typical for horizontal architecture.

Node heterogeneity in load distribution

A group acts as a collective entity. A high-responsible node affords A node increments its responsibility up to its own individually

essential capacity to represent the group on behalf of all its nodes. appropriate level.


The characteristic property is: the major population con-sists of low responsible nodes. A common case is whenmany supernodes connect two layers only. Since m is notlarge for majority of nodes, the responsibility scale hascoarse granularity; the simplest case is binary classifica-tion: low-performance and powerful nodes. Note that if themajor population is high-responsible supernodes then thenode differentiation disappears, degenerating the hierarchy.

The architecture is suitable for non-ordered layers, wheregroups of different layers represent essentially diversedomains. Each layer consists of many regular nodes of itsdomain, making the separation of groups of different layers.Few supernodes are gateways, providing sparse inter-layerconnectivity. Protocols on each layer are domain-aware, andthis diversification separates roles the same supernode playsin overlay routing and maintenance on different layers.

In the ordered case the responsibility may correlate withthe layer index i = 1, 2, . . . , M when nodes of layer i + 1provide more capacity than nodes on layer i. Although it issimilar to the nested case, disjointed architecture requiresrapid layer size reduction in (8). Furthermore, some nodesof layer i may be not presented in layer i − 1. Importantlythat higher layer provides more efficient routing mainly dueto smaller network size and only partially due to highernode responsibility. The latter, in fact, is distributed amongm layers, usually with no specific prioritization.

Even in the ordered case, groups of different layers areconceptually disjointed. A supernode on layer i + 1 acts onbehalf of its group from layer i. In other words, a layer i + 1supernode aims at group-aware decisions and actions; it canbe replaced with another node of the group. Actually, it isevolution of the “virtual nodes” concept [33, 35] when aphysical node runs a group of virtual overlay nodes.

Layers are sparsely connected by a few supernodes; itmakes disjointed architecture close to vertical one. How-ever, disjointed architecture also allows the horizontalapproach. See example schemes in Figs. 5a, c, d and 6a.

Nested architecture primarily arranges the overlay func-tionality among layers i = 1, 2, . . . , M , where M can belarge, allowing fine granularity of the node responsibility.For simplicity we assume the case of ordered hierarchi-cal architecture. The bottom layer contains all N nodes andimplements the basic function that every node must per-form. Every next layer enhances the function with advancedmechanisms that more responsible nodes apply for bet-ter performance or for other improvements of the overlayoperability.

Each node u appears on layers i = 1, 2, . . . , mu ≤ M ,participating in mu networks in total:

u ∈mu⋂

i=1

Ni, u /∈M⋃

i=mu+1

Ni, ∀u ∈ N,

where mu depends on u’s available capacity. Consequently,the following “nested” structure appears:

NM ⊂ . . . ⊂ Ni+1 ⊂ Ni,

which generalizes (8). Each node can be intuitively thought“a column” in the pyramid of layers. In contrast to disjointedarchitecture, the size reduction between layers may be low.

The layer differentiation is due to routing criteria, not dueto the population. A popular solution is that each layer aimsat own routing scale, and higher layers require longer-rangelinks for better global routing. They form expressways forlower layers. Actually, it is evolution of the “large routingtable” concept when every node varies its routing table sizedepending on the capacity, see such flat DHT designs asSmartBoa [61], EpiChord [62], and Accordion [63].

The characteristic property is: the major population con-sists of high responsible nodes. It is a counterpart of dis-jointed architecture. The number of high responsible nodescan be big and even comparable with the size of the wholepopulation. This property does not degenerate the hierar-chy since efficient routing applies an appropriate criterionwithin its layer depending on the current lookup state.

Nested architecture includes ordered variants of verticaland horizontal architectures. An obvious instance is fullyvertical architecture (Fig. 5b) with diverse routing criteriaon different layers. Also, pure vertical architecture (Fig. 5d)can be considered nested if each cluster may delegate manysupernodes to the next layer.

Similarly, in horizontal architecture (Fig. 6b) the bottomlayer consists of small-scale overlays. The node responsi-bility is restricted within a small overlay. Overlays on upperlayers scale up in size, and the node responsibility levelgrows appropriately. This case is an example when any nodeparticipates in every layer.

3.4 Hierarchy and DHT

Let us consider how the introduced hierarchy models andarchitectures is combined with the DHT concept. DHTtopology is tightly controlled via neighbor selection rules.The considered design principles define generic restrictionson neighbor selection in hierarchical DHTs.

Principle 1 (cluster-based model) shows that a nodemaintains two types of neighbors: intra- and inter-clusterlinks. Different DHT protocols can be used for thesetwo types of connectivity. A cluster typically represents aself-contained DHT network with a fast lookup service.Intra-cluster links are shortcuts for global routing.

Principle 2 (tree-based model) leads to neighbor selectionthat embeds a given global hierarchy to the DHT network. Inaddition to the underlying flat DHT rules, link establishmentmust follow the ancestor-descendant relation. The benefitis path locality when routing keeps paths within a domain


whenever possible. In particular, a path between two nodesof the same domain is also within this domain.

Principle 3 (group-based model) generalizes Principles 1and 2, embedding into a DHT network an arbitrary structureof inter-connected groups. Neighbor selection takes intoaccount the membership property: each node must knowneighbors from every group the node belongs to. As aresult, the DHT network topology becomes agreed with thegroup structure, introducing advanced features such as non-uniform resource distribution, semantic and range queries,and node specialization in dependence on node preferences.

Principle 4 (layering) provides a generic scalabilitymechanism for DHT designs. It decouples the entire prob-lem set into smaller layer-specific subsets and inter-layerissues, so reducing the problem size and complexity. Typ-ically, layered DHT topologies use the higher layer toorganize nodes of the current layer and help to estab-lish links between nodes from lower layers. The layeringprinciple plays the fundamental role in majority of exist-ing HDHT designs. Neighbor selection must operate alongthe two dimensions: horizontal (intra-layer) and vertical(inter-layer).

Principles 5 (disjoining) and 6 (nesting) are dual. Theyspecify decomposition of hierarchical DHT architecture. Indisjointed architecture, inter-layer overlapping is kept low,and very few nodes belongs to several layers. In nestedarchitecture, inter-layer connectivity is high since a nodelikely belong to many layers. Applying both principles P2Pdesigners achieve different layered DHT topologies withadvanced intra- and inter-layer connectivity.

4 Hierarchical DHT taxonomy

This section overviews particular proposals of HDHT archi-tectures. The contributed HDHT taxonomy is based onour classification onto disjointed and nested architectures,which was introduced in the previous section. We first listdesigns where the disjoining principle prevails: nodes arepartitioned with low overlapping among multiple layers.Then we consider designs with prevalence of the nestingprinciple: each node can be associated with many layers.

The order in the lists is chronological (to the best of ourknowledge). Since a multitude of proposals appeared in theliterature we cannot present here all of them. We expectthat the lists are representative, and a proposal that is not inthe list is close to one we described. The difference is non-principal in architectural terms, e.g., the ID space realizationor the communication protocol details.

The taxonomy demonstrates how the generic design prin-ciples can be applied for improving concrete P2P systemproperties, such as performance, fault-tolerance, and secu-rity in comparison with flat DHTs. It contains cumulative

experience of existing proposals and solutions from theseexamples can be adopted in future designs.

4.1 Disjointed hierarchical architectures

Disjointed architecture differentiates the node responsibil-ity by arranging heterogeneous nodes along the verticaldimension. Typically, a coarse granularity scale is used, andthe number of high responsible nodes is minor comparedwith the sum number of nodes from lower responsibilitygroups. Groups of different layers are separated—their nodepopulation, responsibility, or functional role are diverse.In particular, a two-layer design assigns a small fractionof nodes to the top layer where they act on behalf oftheir groups. The majority of two-layer designs follows thebasic ordered two-layer architecture shown in Fig. 4 fromSection 3.2.

4.1.1 Hierarchical systems by Garces-Erice et al. [64]

The design adopts the two-layer supernode model ofunstructured P2P systems [29, 40]. Groups of the bottomlayer consist of proximity close nodes. Each group formsan independent overlay, applying the horizontal architectureapproach. Any overlay may use its own protocol for intra-group routing, supporting hybrid architectures. On the toplayer, all groups form a single overlay for global routing.

Overlays on the bottom layer operate autonomously.Within each group, one or more supernodes are selected torepresent the group on the top layer. Supernodes are mostpowerful nodes. A supernode maintains additional indepen-dent routing table for the top-layer overlay. The top overlayaims at efficient global routing due to the small size andgood underlying network coverage.

When a node u joins the system, it must know its groupID on the bottom layer. Hence u contacts any node existingin the system to locate a supernode of the group. If thegroup exists then u joins using the group overlay protocol.Otherwise, a new group is created with the only (super)node u.

The essential point is that each group acts as a virtualnode in the top overlay; group’s supernodes are only rep-resentatives and can be reassigned. Although the designallows most or even all nodes of a group to becomesupernodes, it degenerates when the bottom layer overlaysdisappear moving all routing functionality to the top layer.

This construction can be generalized to M > 2. Thehigher layer the larger-scale routing its overlays provide.The top layer i = M is a single global overlay. Each lowerlayer consists of many groups, and a group is a “node” in theoverlay of the current layer. In a group of layer 1 ≤ i < M ,nodes are classified into many regular nodes and a fewsupernodes. Supernodes represent the group on layer i + 1


maintaining the routing table for their group ID. Each regu-lar node must know at least one supernode of its group. Onlayer i + 1, a supernode for a layer i group either becomesa regular node or acts again as a supernode to represent itshigher-level group on layer i + 2.

Starting at a regular node on layer i = 1, a lookup forkey k sequentially goes to supernodes of groups on layeri = 2, . . . , M and reaches eventually the top layer. Thenthe lookup sequentially visits the groups responsible for kon layers i = M, . . . , 1. Finally, on the bottom layer, thelookup is delivered to the destination node.

The hierarchy reduces the length of lookup paths com-pared with flat DHT of O(log N) routing complexity. Forinstance, in the M = 2 hierarchy with N2 nodes in the topoverlay, the reduction factor is log N/ log N2 if the top over-lay and bottom-layer overlays provide O(log N2) and O(1)

routing, respectively. It reasons the requirement N2 � N .The most routing and hierarchy maintenance load is

pushed to supernodes. A certain mechanism is requiredfor selecting and maintaining supernodes. Since severalsupernodes represent a single entity (group) in the higher-layer overlay, the latter needs to modify appropriately aconventional flat DHT protocol for the overlay.

The original M = 2 design was analyzed for the case ofthe Chord DHT on the top layer. Similar two-layer designwas exploited in [65] for a hierarchical small-world systemwhere the top layer is a single Symphony-based overlay andthe bottom layer consists of group organized as indepen-dent Chord rings. Also this basic design was used in mDHT[66] that further emphasized one of the key postulates: anentire group forms a node in the top layer overlay, regard-less which bottom-layer nodes of the group are recently itssupernodes.

4.1.2 Kelips by Gupta et al. [67]

As in the hierarchical systems of Garces-Erice et al. thedesign employs two layers: the bottom layer is for N1 = N

nodes and the top layer is conceptual; it is for N2 ≤ N nodegroups. Groups use own ID space S2 = {0, 1, . . . , N2 − 1},i.e., the total number of groups should be fixed a priori. Anode associates itself with a group by hashing the node ID,yielding every group to be of size about N/N2 nodes.

The group connectivity structure is closer to unstructuredP2P topology: a node u knows most of nodes from its owngroup, leading O(N/N2) entries in u’s routing table. Foreach other group, u stores additionally a constant-sized setof group nodes: O(N2) entries. Let R be the total numberof resources in the system. Then u indexes R/N2 resources,storing for resource r a pair (kr, IPr ) with resource key andresponsible node IP address.

On one hand, large values of N2 reduce the memory forentries of own group nodes and the memory for resource

indexes. On the other hand, the number of entries for othergroups grows with N2. The optimal tradeoff value is N2 =�(

√N + R). Assuming R = O(N), the optimal value

leads to a routing table with O(√

N) entries.In a lookup, a querying node u hashes the resource name

to the group ID and sends the lookup to the closest nodev that u knows for that group. Then v resolves the lookupby searching among its index and returns IP address of theresponsible node. When v fails in resolving the lookup, thenmulti-hop (and multi-try) routing is enabled. The probabilityof appearance of long lookup paths is low since the routingstate at nodes is highly redundant. As a result, the averagenumber of hops is preserved within O(1) hops.

Kelips is loosely structured and requires expensive main-tenance (large routing tables and resource index) that usesgossip protocols for information dissemination, similarlyto unstructured P2P overlays [7, 53]. It leads to O(

√N ·

polylog N) cost, e.g., the expected convergence time for anevent is O(

√N · log3 N). Nevertheless, the

√N -property of

Kelips design can be used in more structured networks.

4.1.3 Structured superpeers by Mizrak et al. [68]

Similarly to Kelips, each node maintains O(√

N) local stateto achieve O(1) routing. The hierarchy is two-layer andfollows the horizontal approach. Both layers use the samecircular ID space. All N nodes are placed on the outer ring.Among them N2 = �(

√N) high-capacity nodes are chosen

to be supernodes; they create an additional ring—the innerring—to provide fast global routing.

The outer ring is uniformly split into N2 arcs such thatthe nodes of each arc form its own overlay on the bottomlayer and assigns a supernode for the top layer. A bottom-layer overlay has the star topology: the supernode knows allnodes in its outer ring arc (O(

√N) entries). In addition, any

supernode maintains routing entries for all other supernodes(�(

√N) entries). As a result, the inner ring on the top layer

is a single overlay with the fully-connected topology.In a lookup for k, its initiator sends the request directly

to the supernode. If its arc includes k then the supernodelocates the successor of k in the local routing table andreturns the result (local routing). Otherwise, it forwards thelookup to the supernode who is responsible for the enclos-ing arc for k (global routing). That supernode locates thesuccessor node in own routing table and returns the result.Even in the worst case, routing has constant cost O(1).

In bootstrapping, the system constructs an initial set ofsupernodes When the system evolves, an extra mechanismis needed for selecting new supernodes and keeping theiramount equal �(

√N). A topology change of the inner ring

is disseminated to all supernodes, leading to �(√

N) traffic.This two-layer design can be thought as a hybrid of hier-

archical systems [64] and Kelips [67]. The architecture is


disjointed since the difference between a regular node and asupernode is essential. In particular, a bottom layer overlaycannot function without its supernode.

4.1.4 OneHop by Gupta et al. [69, 70]

OneHop ID space is circular. All N nodes form the fully-connected topology overlay on layer i = 1, leading to one-hop routing complexity by the cost of �(N) node state.The memory cost allows a routing table containing mil-lions entries per node. Each node controls its immediatesuccessor and predecessor using the Chord algorithm withperiodic keep-alive messages. Also, membership changesare detected with lookups. To maintain complete mem-bership routing tables, notifications of membership changeevents must reach every node within reasonable time. Thenetwork bandwidth usage is reduced with efficient mem-bership information dissemination. OneHop partitions theupdate function among two additional layers (three-layerarchitecture).

Similarly to Mizrak’s structured superpeers, the ID spaceis divided into N3 � N equal contiguous arcs (slices). Asupernode (slice supernode) is assigned to each arc deter-ministically as the node that immediately succeeds the mid-point of the arc. Any regular node knows its slice supernode.When a new node has ID closer to the slice mid point thenthe node becomes the slice supernode. The slice supernodeoverlay on the top layer i = 3 has fully-connected topology:each supernode knows all other slice supernodes.

In turn, each slice is divided into N2 � N/N3 equal-sized units. For each unit its slice supernode assigns aunit supernode. The assignment is again deterministic: thesuccessor of the unit mid point. Since a slice supernodeknows all its unit supernodes, they form a star-topologycluster on layer i = 2. Any entry in a slice supernode rout-ing table is marked either ‘regular node’, ‘unit supernode’,or ‘slice supernode’.

When a node detects a membership change, it noti-fies its slice supernode. The latter collects all notificationson a given time interval and sends them to other slicesupernode. A slice supernode aggregates the informationfrom slice supernodes and periodically sends the aggregatemessage to its unit supernodes. A unit supernode piggy-backs the changes in keep-alive messages to its immediatesuccessor and predecessor on the bottom ring. A regularnode u propagates the updates in one direction: from thepredecessor to the successor or vice versa. This approachimposes event dissemination trees with low redundancy;duplication of an update message may occur but rare.

OneHop is attractive in small or low-churn systems withup to a few million nodes. OneHop does not differentiatenode capacity: any node should be ready to afford additionalresponsibility if being occasionally assigned a supernode.

The load imbalance between slice supernodes and regularnodes is high [28, 71], which is inappropriate for systemswith low-capacity nodes.

One-hop DHTs (in general, O(1)-hop DHT) are superiorto multi-hop DHTs in stable and high-capacity envi-ronments due to efficient lookup bandwidth utilization.Risson et al. [72] proposed two hierarchical designs basedon one-hop DHTs. The 1HS design (One Hop Sites) usesindependent one-hop DHT overlays that can cooperateacross protected sites and recover from network partitionsbetween those sites. This architecture is appropriate for afederation of data centers. The 1HF design (One Hop Fed-eration) arranges one-hop DHT overlays into a tree-basedsystem of regional hierarchies. Regional overlays havesubtended organizational rings and organizational overlayshave subtended site overlays. This architecture is beneficialfor global applications like name resolution or internettelephony.

4.1.5 Rings of unstructured clouds by Singh and Liu [58]

This two-layer design aims at: 1) structured P2P overlays(DHT rings) on the top layer are for efficient routing and2) unstructured P2P overlays (clouds) on the bottom layerare for anonymity. The global ring is a flat DHT where allN nodes participate. A cloud is a small Gnutella-like net-work. To create a cloud, its name is hashed using multiplehash functions to find several nodes in the global ring. Theyconnect to each other forming the cloud. Each becomes arendezvous node (supernode). When it leaves the system, anew node is found using the global ring DHT protocol.

Each node caches cloud names it sees in lookups. Anew node gets a list of active clouds when it joins thesystem. Then the node hashes the cloud name to obtain arendezvous node. The latter bootstraps the node into thecloud. Nodes control the cloud size using a distant vec-tor (distance from the rendezvous nodes); a cloud stopsaccepting new members if its distance limit is reached. Thenumber of rendezvous nodes per cloud is set small (com-pared with the typical cloud size), resulting in disjointedarchitecture.

The dynamic relation between a cloud and the servicesthat its nodes provide uses rendezvous rings (R-Rings). AnR-ring is an independent DHT overlay consisting of one ren-dezvous node from each cloud. A rendezvous node uses itscloud ID in the R-ring. Creating an R-ring follows the sameDHT protocol as the global ring. There are multiple R-rings,one for each rendezvous node of every cloud. The R-ringstructure introduces additional disjointed sub-hierarchy.

A lookup originates in a cloud and uses a random walkwithin the cloud. The originator sets random TTL and thelookup is forwarded to random neighbors until TTL = 0.The last node becomes a crossover node—a random node


that communicates on behalf of the originator, so preserv-ing its anonymity. If a lookup is tagged with a cloud namethen the global DHT ring is used to locate a node of thetarget cloud. If the relation between clouds and services isdynamic, the crossover node finds a rendezvous node ofits cloud. Then the lookup runs over the R-ring to locate arendezvous node of the target cloud. The rendezvous nodebroadcasts the lookup in its cloud. Then a responsible nodereplies to the crossover node, which forwards the reply backto the originator. A random walk can be applied also on thetarget cloud side for service provider anonymity.

Random walks decrease the routing performance. Eachrendezvous node has to maintain an additional routingtable for its R-Ring. The load due to node rendezvous andcrossover responsibility is uniformly distributed among Nnodes, regardless of their heterogeneity.

The rendezvous mechanism is extremely important forhierarchical architectures. It protects from object mobilitywhen a node or resource moves to another overlay. Rissonet al. [57] proposed a rendezvous abstraction for a wide classof hierarchical P2P architectures. Overlays store additionalrecords implementing a location information plane to man-age globally unique, persistent, semantic-free identifiers.

4.1.6 Chordella by Zoels et al. [41, 73]

This generic two-layer horizontal hierarchical architecturerequires every bottom-layer overlay to delegate exactly onesupernode to the top layer. The top level uses a flat DHTto form the global ring of supernodes; the analysis appliesthe Chord DHT for the reference case. On the bottom layereach supernode manages its own overlay of regular nodes,exploiting the supernode as a proxy.

A unique l-bit ID is associated with each node, eithersupernode or regular one. The resource ID space reflects thetwo-layer hierarchy: a resource item has a 2l-bit key withfirst l bits being independent on the second l bits. The firstpart identifies the responsible group and the second partdetermines the responsible node within that group. Routingis two-phase: global and local. First, a lookup for k resolvesthe responsible group finding in the top-layer overlay thesupernode by a DHT lookup for the first l bits of k. Sec-ond, the lookup determines the responsible node within thatgroup using the second l bits of k.

There are three alternative connectivity structures foroverlays of regular nodes: fully-meshed (FuMe) when everynode is connected to all other nodes of its overlay, single-connection (SiCo) when every regular node is connectedto its supernode only (i.e., star topology), and DHT-basedwhen a flat DHT forms each overlay. Note that the anal-ysis was limited with the pure case when all bottom-layeroverlays follows the same connectivity structure. In theFuMe and SiCo structures, any group’s supernode knows all

members of its group, and local routing is one-hop. In theDHT-based structure, local routing is O(log N/N2) if theChord DHT is used. Note that a node always contacts itssupernode, even if the responsible node is within the samegroup.

The evaluation showed that the SiCo structure is supe-rior to the FuMe and DHT ones, in terms of the tradeoffbetween minimizing the total network traffic and avoidingto overload the highest loaded supernodes, see Section 5.4.Further, work [74] presented a distributed algorithm thatallows dynamically achieving and maintaining cost-optimaloperation in a two-layer HDHT network. All decisions takenby the nodes are based on their local knowledge on a set ofsystem parameters, describing the current system state.

The design was extended in [75] with a load balancingalgorithm for supernodes. It aims at assigning an appropri-ate number of regular nodes (or other types of the load) toevery supernode. Every supernode keeps information on theload level of some O(log N) other supernodes. A new nodeis assigned to such a supernode that currently has a low loadlevel. When a supernode leaves, its regular nodes have torejoin the system. In fact, it is a step towards self-adoptablehierarchy with the fixed number of layers.

SA-Chord [76] is an instantiation of the Chordelladesign. The goal is to move the routing load entirely toa small set of the most capable nodes. These N2 supern-odes are “routing nodes”; only they may forward lookups.The N1 = N − N2 regular nodes are “non-routing nodes”;they are lookup sources and destinations only. The top layer(routing ring) is a modified Chord DHT and the bottomlayer follows the SiCo connectivity structure. A regularnode always requests its supernode for a lookup. Then therouting ring resolves the lookup to the responsible supern-ode. The latter finally forwards the lookup to the destinationregular node.

SA-Chord uses the flat Chord space. There is no need inhierarchical IDs since regular nodes do not perform routing.A supernode knows all regular nodes on the arc clockwisefrom itself to the closest supernode. The routing ring allowsk fingers per power-two interval, where k ≥ 1 is a sys-tem parameter. This redundancy improves the routing ringperformance to O(log2k N2) hops per lookup comparedwith O(log2 N2) of the basic Chord. Additionally, proxim-ity neighbor selection is applied to reduce routing latency.

4.1.7 HONet by Tian et al. [59]

The basic two-layer hierarchy is used to construct a hybridP2P system (cf. Section 4.1.5). The bottom layer consists ofmany topologically-based clusters of small size, each main-tains own overlay with its supernode (cluster root node).All cluster roots form a single global overlay (the core net-work) on the top layer. Any overlay is DHT-based and has


own independent ID space. De Bruijn network is used forthe reference case. A node (resp. resource) is identified byits cluster root ID in the core network and its node ID (resp.resource key) in the local overlay. Hence a full ID is apair (c, k).

A cluster root is chosen as the most stable member ofthe cluster. HONet uses a coordinate system with a group ofwell-known landmark nodes. Cluster coordinates are coor-dinates of its root; they are stored in the core network DHT.Space-filling curves, which preserve locality, are used tomap multidimensional coordinates to the one-dimensionalcore network ID space (inspired by work [31]).

When a node joins, it searches in the core network anearby cluster using its own coordinates mapped to the corenetwork ID space. If no cluster root is found within a prede-fined cluster radius, then the node becomes a root of a newcluster. Otherwise, the node joins the existing cluster.

Hierarchical routing is two-phase. If a lookup for (c, k)

is started in a cluster c′ �= c then the global routing is per-formed in the core network, and the lookup moves from onecluster to another. The lookup reaches a root node of the tar-get cluster using c. Then the lookup continues on the bottomlayer, i.e., in the local DHT overlay using the key k.

For better routing efficiency the design employs ele-ments of unstructured P2P topology. Each node, in additionto its DHT-based routing table, creates random links tonodes of other clusters if the node capacity allows. The linkconstruction uses a random walk algorithm when a nodeinitiates a message with a finite TTL. A node u forwardsrandom walk messages to a neighbor v with probability pro-portional to fv, where fv is a generic fitness metric thatcharacterizes v’s available capacity and the performance oflink u → v. When a node receives a random walk messagewith TTL = 0, it can provide the inter-cluster link.

A node publishes information about its inter-cluster linksin its local cluster DHT using cluster IDs as the keys. Ifa node u is responsible for a key c in the cluster c′ �= c

then u knows all nodes in its cluster c’ that keep randomlinks to members of the cluster c. Hence u can serve as alocal reflector to c to accelerate global routing of lookupsfor (c, k). If a node v ∈ c′ does not know a random link toa node of c then v forwards the lookup to the local reflectoru based on local DHT routing. The reflector forwards thelookup either to a local node that knows a random link to cor to the cluster root to perform hierarchical routing. Whenthe lookup enters another cluster using hierarchical routing,it can try fast inter-cluster routing again.

Note that this solution applies the small-world prin-ciple when a routing table consists of short-range andlong range neighbors [11, 52]. The same global routingacceleration method was proposed in Cyclone [60, 77](see Section 4.2.10). Although the design applies thesame cluster-based model as the hierarchical systems of

Garces-Erice et al., the architecture is closer to vertical one;all clusters tend to form a global overlay connected usingrandom links.

4.1.8 Content-based hierarchy by Zoels et al. [78]

The design applies content-based hierarchy to support effi-cient topic-based search. The hierarchy reflects a con-tent category tree, thus following the tree-based hierarchymodel. The architecture is ordered horizontally, set up bymany Chord rings as topic-spaces (Fig. 7). Each ring cor-responds to a vertex in the tree and stands for an individualtopic, divided onto subtopics with own rings on the lowerlayer. The root ring is the only overlay on the top layer.

A ring is initiated and managed by its “ringmaster”(supernode). Initiating a new ring, its supernode informs theparent ring supernode and remains a regular node in theparent ring. A supernode is always aware of its parent ringand all own child rings. Most of ring nodes are regular;they belong to that ring only. It is characteristic tendencyof disjointed architecture. The design is much inspired bymulti-ring hierarchy [43] (see Section 4.2.8). The latter,however, is closer to nested architecture; many nodes canparticipate in multiple rings.

There are two types of resource names: qualified andunqualified. A qualified name refers to the exact path in thecontent category tree (e.g., ‘/root/germany/munich/airport’)and identifies the corresponding rings. An unqualified

ring

’Germany’ring topic:

ring topic:

’Finland’ring topic:

’Munich’

. . .

root

B

A. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

ring topic:’Airport’

C

. . .

Fig. 7 Example of content-based multi-ring hirerachy [78]. Filledcircles correspond to supernodes; each maintains its own ring for asubtopic and acts as a regular node (unfilled circle) in the parent ring.Topic content is distributed over all ring nodes. Path A → B → C

corresponds to qualified name ‘/root/germany/munich/airport’


name is a keyword to lookup the resource in a Chord ring(keywords are hashed to the ring ID space).

The design allows dynamic structure adaptation whenusers create and remove their own topic spaces. It requires,however, some discipline among users; they should workwith the ring respective to a given topic. Alternatively, anextra mechanism can be employed, e.g., a bootstrap serverthat is aware of the global system structure.

4.1.9 Hierarchy-adaptive topology by Zhang et al. [79]

The design is dynamically adaptable implementation ofpure vertical architecture. The number of layers M ischanged according to current value of N. The tree-basedhierarchy model is applied for bottom-up construction ofthe layers.

The ID space is d-dimensional torus with Cartesian coor-dinates, the same as in CAN [2, 30]. Node ID determines azone (hyper-rectangle) in the space, and the node is respon-sible for all keys that are points in its zone. Neigbors arenodes whose zones overlap along d −1 dimensions and abutalong one dimension. Zones can be split and merged usingthe CAN space operations.

The layer construction uses bottom-up and top-downdirections. The former is cluster-based and takes intoaccount the underlying network proximity. The latter intro-duces tree-based partition of the ID space. The constructiondepends on the current value of N. There are certain param-eters and rules that control the size of clusters, and hence thenumber of nodes on each layer and the number of layers.

In the bottom-up direction, layer i nodes are grouped intoclusters according to proximity in the underlying IP net-work. Each cluster selects its supernode among the mostpowerful nodes; it will represent the cluster on layer i + 1.Any node (child node) has a direct link to its supernode(parent node). Furthermore, the design requires a node tokeep links to all its ancestor nodes on upper layers (ancestorrouting table). Each supernode also maintains links to all itsimmediate children (child routing table).

In the top-down direction, the ID space is partitionedamong all the layer i + 1 nodes. The common layer i + 1overlay is formed using neighbor connections. The zone thata node u maintains on layer i + 1 is further divided betweenu’s children, which form a proximity cluster on layer i.

To join the system a new node u finds the proximityclosest node and becomes a member of the correspondingcluster on the bottom layer. If all other nodes are far fromu then u forms own cluster. Being a member of a cluster, umay become a supernode by some election procedure. Afterthat u joins clusters on upper layers. When u leaves thesystem its role is delegated to another node.

If a lookup key k is within the current zone of a node u,then the lookup already reaches the destination. Otherwise,

u forwards the lookup to the ancestor node whose zone is thesmallest one that covers k (bottom-up routing first). Thenthe lookup iterates through the child nodes whose zonescover k until the destination node. Routing to neighbors onthe same layer is used when ancestor nodes are failed orbusy.

Nodes on higher layers have more responsibilities sincethey handle more lookup requests. A node can maintain fastinter-layer links (shortcuts) that reduce the load of higherlayer nodes and accelerate global routing. The idea is similarto HONet but instead of random walks the design appliesthe expressway method of eCAN [80, 81] (Section 4.2.2).A node u creates an additional shortcut to a grandchild of aneighbor of u’s ancestor on some layer.

The key points of this hierarchical design are 1) proxim-ity neighbor selection and 2) longer hops in the ID space onhigher layers. The first one is due to clustering when nearbynodes in the underlying network become neighbors in theoverlay. The second point is due to geometrical reductionof the number of nodes with i = 1, 2, . . . , M , and hopsbetween neighbors become larger distance leaps.

4.1.10 Chord 2 by Joung and Wang [82]

The design is a variation of the basic ordered two-layerarchitecture. The aim is at reducing overlay maintenanceoverhead for regular nodes, which compose the majorpopulation. A Chord ring of all N nodes is on the bot-tom layer (the regular ring) and a Chord ring of N2 � N

supernodes is on the top layer (the conduct ring). The costlyfingers maintenance is pushed to the conduct ring, sincesupernodes are assumed to be the most powerful and stablenodes.

A regular node knows at least one supernode. Regularnodes proactively (periodical checks) maintain their suc-cessor and predecessor links. When a node u detects nodejoining or leaving, u reports to a supernode, and the con-duct ring becomes responsible for the notification of allaffected nodes. Supernodes receive join/leave informationand reactively3 send the notifications. Receiving an updatenotification, a regular node entirely updates its routing table(all finger links). Therefore, regular nodes do not performthe costly proactive finger maintenance of the basic Chord.

The system starts from its regular ring. Nodes collect theuptime information (or other metric) for supernode selec-tion. Each node estimates uptimes of other node and rec-ommends the most stable nodes to be supernodes. (Variousreputation-based election mechanisms can be also intro-duced.) Once a node is promoted to be a supernode, itconstructs an additional ID for the conduct ring and joins

3In basic Chord, every node periodically calls the stabilization andfixfingers procedures—a proactive strategy.


this ring following the basic Chord protocol. A variant whenIDs in both rings concise is possible.

The conduct ring DHT stores information about regularring topology. Each regular node u inserts a link object(u, usucc) where u is used as a key and usucc is u’s suc-cessor. Similarly, u inserts a finger object (v, u) for everyv ∈ Tu. Now, whenever a node w leaves the regular ring,a supernode knows which nodes have neighbor links to wand notifies them to replace w with w’s successor in theirrouting tables.

When a node joins the regular ring, its routing table ofsize O(log N) can be constructed based on data from theconduct ring, the total cost is O(log N2 · log N) comparedwith O(log2 N) in basic Chord. When a node leaves the reg-ular ring, either it informs a supernode or its predecessordetects the departure. The cost takes O(log N) messages toupdate routing tables in affected nodes. In basic Chord, thecost of node leaving is included to the maintenance cost.

A regular node periodically checks only their successorsand predecessors; each round takes O(1) messages com-pared with O(log N) in basic Chord, where a round affectsall neighbors. The conduct ring maintenance is the same asin basic Chord: the costs of node joining and maintenanceare equal, each takes O(log2 N2) messages per operation.Since supernodes are stable, the maintenance cost can bereduced by increasing the interval of the periodical checks.

Additionally, each supernode stores O(

NN2

)link objects

and O(

N log NN2

)finger objects.

Recall that Chordella also allows Chord rings on bothlayers. It follows, however, the horizontal architectureapproach with many rings on the bottom layer. In contrast,Chord2 is a vertical architecture design with a single Chordring per layer. From this point of view, Chord2 is closer toHONet, but the latter has less structured topology on thebottom layer.

Tanta-ngai and McAllister [83] applied the similar ideafor separating the local and global routing onto own layers(cf. Brocade and eCAN in Section 4.2). The bottom layer isa flat Chord ring in S = [0, 2n). On the top layer, an auxil-iary supernode ring (the expressway) uses finer arc-partitiongranularity than Chord two-power intervals. A supernode uhas a link to the closest supernode in every arc[u + api , u + (a + 1)pi

), a = 1, 2, . . . , p − 1,

i = 0, 1, . . . , [logp 2n] − 1,

where p ≥ 2 is an integer parameter (expressway forward-ing power). More capable nodes become stronger connectedvia the expressway ring, aiming at better global routing.Instead of Chord periodic updates, reactive event-basednotification is used for maintaining the expressway ring.

The maintenance is cheaper since supernodes are relativelystable.

4.1.11 Hierarchical DHT-based overlay networksby Martinez-Yelmo et al. [84–86]

It is a variant of the basic ordered two-layer hierarchicalarchitecture with the focus on global multimedia applica-tions where many inter-operating domains exist for users orcontent. The particular reference case is P2PSIP, where SIPservice is based on P2P manner and P2P signaling traffic iscarried by SIP messages.

The two-layer hierarchy directly reflects the applica-tion domain division, following horizontal architecture. Adomain corresponds to an exclusive group of nodes (mul-timedia participants); they form an independent overlaynetwork (domain overlay) using a flat DHT. Each group isrepresented by at least one supernode on the top layer. Thebasic design uses one supernode per domain. Supernodesare dedicated entities allowing regular nodes (e.g., deviceslike a mobile phone) to be of lower load. These N2 � N

supernodes form the global interconnection overlay using aflat DHT.

The interconnection overlay acts like a directory servicefor different domains. Each supernode publishes its loca-tion information and information about its domain overlay.There is a common packet format to assure interoperabil-ity between different domains. The supernode selection andupdate mechanism must be integrated in the maintenanceprotocol of each DHT, e.g., using the strategy of [87].

ID prefixes and suffixes are used for identification of dif-ferent layers. A suffix ID (ls bits hashed from node domainname, e.g., “example.com”) is for intra-domain communi-cation and a prefix ID (lp bits hashed from the full name,e.g., “[email protected]”) is for the domain. For eachresource, a node stores a resource tuple containing theresource hierarchical ID, the resource full name, and theresource itself.

In a lookup for a hierarchical resource key k, a node ucompares its own prefix with k’s prefix. If they are differ-ent then u forwards the lookup to its supernode. Otherwise,intra-domain routing starts. As in Chordella and the hierar-chical systems of Garces-Erice et al., the reduction factorof regular node routing state is log N/ log N2 if DHTs withO(log N) state are used in domain overlays. Supernodeshave to support N/N2 times more bandwidth and queries. Incontrast, a regular node forwards lookups to its supernodeonly when inter-domain communication is needed.

Another hierarchical P2PSIP design was introduced byLe and Kuo [88]; it does not use domain-based classificationand is primarily oriented to node physical capacity. Thereare M ≥ 2 layers, where M is fixed in advance assum-ing that N does not vary significantly. Each layer forms


own Chord-based overlay, following vertical architecture.The design clusters N heterogeneous nodes according withtheir capacity and forms the population of each layer suchthat the number of nodes is geometrically reduced withi = 1, 2, . . . , M .

A model of M-layer hierarchical P2P architecture forInternet telephony was introduced in [89]. The horizontalapproach is applied with the system-level parameters C > 1and K > 1. On layer i its nodes form non-overlappedgroups of size C using an arbitrary clustering mechanism.A group is an overlay network based on any P2P protocol.Each group delegates one supernode to layer i + 1 and con-trols at most K groups on layer i − 1. Notably that a groupmay include nodes that do not belong to any lower layer—an additional sign of disjointed architecture. Although eachgroup can be a DHT-based overlay with O(log C) intra-group routing, the inter-group routing algorithm is closer tounstructured P2P networks when a lookup is duplicated upto K + 1 directions: K or less groups on the lower layer andsupernode’s group on the upper layer.

The model assumes that nodes join the system accord-ing to a non-stationary Poisson process with arrival rateλ(t) and stay active for i.i.d. random time, hence the prob-lem is reduced an Mt/G/∞ queue. The average populationsize can be estimated for sinusoidal arrival rate and expo-nential sojourn time (typical assumptions in telephony). Inthe steady state it yields periodic behavior of N = N(t)

as well as of the lookup performance. Although the modelcan be applied for the two-layer design of Martinez-Yelmoet al. and the M-layer design of Le and Kuo, the assump-tion on immediate hierarchy reorganization with varying Nand the flooding-like routing algorithm make the model lessrealistic.

4.1.12 HSN by Guisheng et al. [90]

The two-layer hierarchy has the small-world ring of supern-odes on the top and many clusters on the bottom, hencefollowing the horizontal architecture approach. The basicidea is close to structured superpeers of Mizrak et al. All Nnodes participate in the global ring on the bottom layer. Itis partitioned into several clusters, each one is an indepen-dent overlay. A cluster assigns a supernode (cluster head),and the N2 supernodes form the top layer ring overlay.

In the global circular ID space, a cluster covers an arc inbetween two consecutive supernode IDs. The start (clock-wise) node u is the cluster supernode; other consequentnodes on the arc are regular, forming the set Cu. A supern-ode is a predecessor for all its cluster nodes. Every node in acluster knows its cluster arc, and local decision can be madewhether inter- or intra-cluster routing is needed.

Short links connect regular nodes of a cluster. Eachnode keeps short bidirectional links to its nshort consecutive

successors and therefore knows nshort its consecutive pre-decessors. In addition, a short link connects a regular nodewith its supernode. Intra-cluster communication uses greedyrouting, which is similar to the basic Chord algorithm.Since the arc size is kept small, a mechanism like Chordfingers is not very essential in the local routing and the sim-ple use of consecutive successors is adequate. Moreover,the bidirectional property allows anti-clockwise forwarding,improving the greedy routing performance.

When inter-cluster routing is needed the regular node for-wards the lookup to the supernode. The latter decides andfixes the routing direction (clockwise or anticlockwise) forsubsequent intra-cluster routing. After that the intermediatesupernodes on the path do not change the routing direction.

Each supernode u keeps nlong long bidirectional links(in addition to its |Cu| short links). The neighbor selec-tion algorithm follows the small-world principle: neighborsshould appear on all distance scales; the probability ofcloser neighbors is higher. Actually, the design uses thealgorithm similar to Chord fingers: the ring is partitionedonto power-two intervals and one long link is selected fromeach interval. For nlong = �(log N2) or higher it leads toO(log N2) routing in the supernode ring. For lower nlong theestimate of O(polylog N2) hops is valid [26, 55].

A cluster split and merge mechanism keeps every clus-ter within the given size bounds. It controls Gmin ≤ |Cu| ≤Gmax for system-level constants Gmin and Gmax. The reac-tive routing maintenance is on the top layer; when a messagepasses through a supernode, it piggybacks its node ID and IPaddress to the message. The proactive routing maintenanceis on the bottom layer; every regular node periodically sendsheartbeat messages to its neighbors.

The connectivity structure of the supernode ring is a dis-crete implementation of the small-world principle, leadingto logarithmic routing [11]. The shown experiment rout-ing outperformance over the flat Chord DHT is achievedpartially because of the bidirectional routing. There existseveral bidirectional modifications of Chord [91, 92] withthe similar outperformance over basic Chord.

4.1.13 GTPP by Ou et al. [42]

GTPP (General Truncated Pyramid P2P) architecture is adesign scheme for tree-based P2P hierarchies. On layers i =1, 2, . . . , M − 1, nodes are grouped into several disjointedoverlays with own P2P protocols. Each overlay assigns onesupernode to be a gateway to the next layer. Finally, a singleoverlay appears on the top layer i = M .

From the vertical perspective, the overlay structure con-sists of multiple trees rooted by the nodes of the top overlay,which make the hierarchy looking like a truncated pyramid(see also Fig. 5d in Section 3.3). The number of nodes oneach layer decreases exponentially with i = 1, 2, . . . , M ,


assuming that the majority of nodes have relatively weakcapability and the much powerful nodes are in the minority.

A lookup is sequentially forwarded bottom-up until itreaches the top overlay. The latter delivers the lookup tothe closest supernode. Then the lookup sequentially goestop-down until the responsible node is found.

The analysis showed that GTTP decreases the expectedrouting latency though the path hop length can be higherthan in a flat DHT. The GTPP layering provides certain rulesfor load allocation to heterogeneous nodes: the higher thelayer the more load its nodes have. GTPP architectures withM = 2, 3 layers are most reasonable in practical settings.

4.2 Nested hierarchical architectures

Nested architecture arranges the system functions alongthe vertical dimension, e.g., routing on different networkscales. Typically, the arrangement is incremental, and anode selects its “function stack” with the basic function onthe bottom and subsequent enhancements on upper layers.Processing an operation a node decides which layer suitsbest. In particular, a two-layer design often employs thetop and bottom layers for global and local routing, respec-tively. Although nested architectures are less popular com-pared with the disjointed ones, one of the main benefits iseasier operation with the number of layers: adding or remov-ing a layer does not affect the functionality and changes onlynon-functional property, such as performance.

4.2.1 Brocade by Zhao et al. [93]

Brocade is known the earliest HDHT design. It is based onvertical two-layer architecture. The primary overlay of allN = N1 nodes is on the bottom layer and the secondaryoverlay of N2 ≤ N supernodes is on the top layer. Tapestryis a flat DHT to implement both overlays. The designdoes not prevent the use of other flat DHTs. Althoughthe overlays use the same ID space, they are constructedindependently, and a supernode has two independent IDs.

On the bottom layer, proximity close nodes are groupedtogether and get assigned a supernode for the secondaryoverlay. A supernode provides shortcuts across distantdomains in the underlying IP network. The key evidenceof nested architecture is that shortcuts form an additionalmechanism for better global routing. The secondary over-lay is optional, and global routing, although less efficient, ispossible my means of the primary overlay only.

If N2 ≈ 0 then the overlay degenerates to a flat DHTnetwork. If N2 ≈ N then almost all nodes diversify routing:local links in the primary overlay and long-range links in thesecondary one. Both overlay can be mixed in a lookup pathdepending on which routing criterion, local or global, suitsfor the next hop.

A supernode must have significant processing power andhigh-bandwidth outgoing links. Preferably, it is a networkaccess point such as gateway or router. The final choice isresolved by an election algorithm or by the responsible ISP.As a result, supernodes are distributed over the underlyingnetwork such that each domain is represented by its supern-ode. Consequently, supernodes become landmarks for IPnetwork domains.

In the naive solution, each supernode maintains a list ofall nodes in its group. When a lookup reaches a supern-ode, the latter determines whether the lookup is destined toa node in a local group or global routing is needed. Ideally,supernodes are endpoints of a tunnel through the secondaryoverlay that allows transferring a message directly from itscurrent domain to the destination domain. Hence a supern-ode can forward inter-domain lookups using the secondaryoverlay for more efficient routing. Also, a regular nodecan directly forward lookups to its supernode to activateefficient global routing. In any case, a lookup path canmix supernodes and regular nodes, jumping between theoverlays.

A supernode has to maintain additional routingtable. Although the location and capacity of supernodesimprove the performance, routing in both overlays is stillproximity-unaware in the original proposal. An overlay hopcan incur many hops in the underlying IP network.

4.2.2 eCAN by Xu et al. [80, 81]

eCAN is a hierarchical extension of flat CAN [2, 30].The design primarily aims at improving the routing perfor-mance from O(

√N) overlay hops to O(log N). The basic

CAN design uses only local neighbors: d nodes, one perdimension in the CAN d-torus. To improve routing, eCANaugments routing tables with long-range links.

The same idea is utilized in many flat DHT designs,leading to pre-hierarchical schemes when each node locallyselects additional long-range neighbors [11]. In contrast,eCAN constructs a global overlay (expressway) for eachdistance scale. An expressway overlay forms own layeri ≥ 2. Its topology is formed with links of the corre-sponding span; the higher i the longer span. The bottomlayer is a flat CAN DHT overlay, it performs the basicrouting function, all upper overlays are auxiliary, and therouting function is incrementally arranged along the verticaldimension.

The eCAN expressway mechanism is evolution of theBrocade shortcut mechanism to arbitrary number of layers.Instead of the coarse-grained routing scale “local (i = 1)vs. global (i = 2)”, the range i = 1, 2, . . . , M is used forfiner distance granularity. The span scale grows exponen-tially with i, which agrees with the small-world principle. Itleads to paths where the closer the destination the shorter the


hop span, see Fig. 8, which is analogical to geometricallyprogressive routing paths of flat DHTs.

As in Brocade, an expressway link preferably has highbandwidth, and more capable nodes take more load beingactive expressway nodes. In contrast to Brocade, this noderesponsibility differentiation is implicit; it does not requirespecific registration in expressways.

Recall that a Brocade shortcut spans long distance inthe underlying IP network. In contrast, an eCAN express-way is primarily for long leaps in the overlay ID space.Nevertheless, eCAN employs proximity neighbor selectionfor long-range links, reducing the routing stretch. Addition-ally, topology-aware CAN overlay construction [30] allowsadopting the node distribution in the ID space to the nodelocation distribution in the underlying IP network.

In the sequel work Xu et al. [31, 94] develop theexpressway method further. They diminished strict DHTprotocol rules in expressway construction; the distancedirectly reflects the underlying network topology, leadingto better adaptation and lower routing stretch. Registrationof high-responsible nodes and publishing proximity-relatedinformation in expressways are mandatory.

4.2.3 Super-peer based lookup by Zhu et al. [95]

The design is close to the Brocade architecture. The pri-mary overlay and secondary overlay are for local and globalrouting, respectively. They use any flat DHT protocol.Supernodes are elected or selected based on the node capa-bility: network bandwidth, storage capacity, and processingpower. The secondary overlay uses its own ID space, and asupernode has two independent routing tables.

Additionally, a supernode acts a centralized server to aset of regular nodes. It maintains an index over the resourcesavailable at any regular node of this supernode. Hence,centralized clusters with the star topology appear on the bot-tom layer, introducing elements of horizontal architecture.

d

u

Fig. 8 Large hops are at the beginning; then the span is reducedexponentially

When v joins the system, it associates itself with a nearbysupernode u and becomes a regular member of the cen-tralized cluster. Also v constructs its routing table for thebottom layer overlay and behaves according to the flat DHTprotocol. Participating in the system, v notifies u about alllocal resources (for any insertion and removal). When vleaves the system, u updates the index appropriately.

In contrast to Brocade, where routing can mix paths fromboth overlays, this design states that lookup always first triesglobal routing using the secondary overlay. A regular nodeforwards a lookup to its supernode. Then the lookup runsin the secondary overlay until the responsible supernode isfound. This supernode forwards the lookup to the destina-tion regular node (local one-hop routing based on the localindex). If a lookup cannot continue in the secondary overlaythen the lookup takes the primary overlay.

The design has no clear prevalence of nested vs. dis-jointed architectures. The nested property becomes apparentwhen the secondary overlay topology is formed with long-range inter-domain links in the IP network. The partitiononto centralized clusters makes the architecture closer todisjointed. The design balances these two approaches inbetween two points of view on the routing mechanics: eitherthe top DHT is auxiliary and boosts the global routing per-formance, or the bottom DHT is auxiliary and preservesthe local routing dependability when the top DHT andcentralized clusters cannot resolve a lookup.

Note that resource update notifications within each cen-tralized cluster lead to overhead for regular nodes, not onlyfor supernodes. It can be inappropriate for environmentswith very low-performance nodes and frequent update rates.

4.2.4 Coral by Freedman et al. [96]

The designs supports any number M > 1 of layers andapplies pure horizontal architecture. Each layer consists ofclusters of nodes with similar RTTs. The node ID space isthe same for all layers. A node belongs to one cluster at eachlayer—the extreme property of nested architecture.

The recommended value M = 3 has the following rea-son. There are many fast clusters with regional coverage(low-level overlays for i = 3, a reasonable RTT threshold is30 msec), a few clusters with continental coverage (middle-level overlays for i = 2, RTT is up to 100 msec), and oneplanet-wide cluster (the global overlay for i = 1, RTT isunlimited). A node joins an acceptable cluster, e.g., one inwhich the latency to 90 % of nodes is within the clusterdiameter. If a node cannot find such a cluster, it forms itsown.

A lookup for k first starts in the cluster on layer i = 1. Ifthe current cluster on layer i does not contain a responsiblenode, the lookup reaches the closest node u to k in this clus-ter. Then the lookup continues on the next layer i + 1 in the


higher-level cluster that u belongs to. The process continuesuntil a responsible node is found.

Coral routing guarantees that lookups at the beginningare fast (see Fig. 9) since (i) the small cluster size leadsto few overlay hops and (ii) proximity-awareness leads tolow routing latency. An additional replication mechanismcomplements this routing strategy. Resources (pointers orindexes) are replicated to nodes along paths to the destina-tion node, and it is likely to resolve many lookups locally.

When a lookup reaches layer i, it first continues on thislayer, i.e., within the cluster. Thus a node needs informationwhich layers incoming lookups must use. Coral uses clus-ter IDs and implements a cluster management mechanism:joining a cluster, merging and splitting clusters. It increasesthe maintenance overhead compared to flat DHTs.

The Coral design does not take the heterogeneity ofindividual node responsibility into account. The hierarchyspecifically arranges the routing functionality along verti-cal layers to achieve higher performance than in flat DHTs.In this case, deploying a few high-responsible nodes isnot a radical change; all nodes must afford their capac-ity non-exclusively. Moreover, if a node becomes an activeparticipant then its nearby nodes also have to be active dueto the locality property of Coral. Therefore, the design mightbe appropriate in environments with domain heterogeneity.

4.2.5 HIERAS by Xu et al. [97]

The design is similar to Coral. The key difference in a clus-tering mechanism: Coral uses ping-pong probes for RTTestimation and Hieras uses distributed binning. Each of Mlayers consists of several ring overlays, following the hor-izontal architecture approach. A ring contains a subset of

d

u

Fig. 9 At the beginning hops are fast in the underlying network. Fora distant destination hops become of higher routing latency

nearby nodes of the underlying IP network; they take equalresponsibilities for the workload within the ring. ValuesM = 2, 3 are recommended for a tradeoff between therouting performance and ring maintenance overhead.

All N nodes are partitioned into several rings on layer i =1, 2, . . . , M . The top layer i = M has one ring only. A nodebelongs to exactly one ring on every layer. The lower is thelayer, the smaller is the latency between nodes in the ring.Nodes partition themselves into disjointed rings using thedistributed binning scheme [30]. It requires landmark nodes,a well-known set of machines spread across the Internet.

A lookup first runs on the bottom-layer ring where thelookup originator is located. It moves up and eventuallyreaches the top-layer ring. Since each node belongs to Mrings in sum, it has to maintain M routing tables. Thereare also ring tables for maintaining information of differ-ent rings. A ring table is stored at the node whose ID isnumerically closest to the ring ID as well as duplicated toseveral other nodes for fault-tolerance. Additional opera-tions at a node are needed, e.g., calculating ring informationand requesting ring table when a new node joins the system.

Park et al. [98] proposed P3ON (Proximity based P2POverlay Network), a two-layer design that is conceptuallyclose to HIERAS and Coral. The global overlay of all nodesis on the top and many local overlays are on the bottom.A local overlay is a Chord ring that connects all nodes ina single autonomous system (AS). Dividing IDs onto theprefix and suffix parts allows node IDs of the same AS to beclose in the node ID space. Resources are initially stored inthe global overlay, then popular resources are replicated inlocal overlays. Routing is performed first in a local overlay,reducing the latency.

Xu and Jin [99] proposed Uinta and SW-Uinta (small-world) designs based on the same two-layer cluster-basedarchitecture as in Coral, HIERAS and P3ON. Uinta con-siders underlying network proximity and data semantics incluster formation on the bottom layer. SW-Uinta changesthe deterministic Chord-like neighbor selection of Uintato the stochastic small-world strategy, resulting in reducedmaintenance cost and improved routing performance.

4.2.6 TOPLUS by Garces-Erice et al. [100]

It is an extreme variant to proximity-aware overlay con-struction. Underlying network topology is the key factorthat influences the overlay hierarchy. A TOPLUS overlaystraightforwardly simulates IPv4 subnetwork hierarchy. Asa result, a lookup path follows the router-level shortest-distance path, and the routing stretch becomes close to 1.

The node ID space S is the set of all IP addresses. Groupsare virtual entities identified with IP network prefixes. Thelatter are obtained from BGP tables. Proximity close nodesare organized into groups (subnets in ASes). Groups are


organized into supergroups (ASes). Supergroups merge tohypergroups (aggregations of ASes). Therefore, a typicalcase is M = 3 layers, similarly to Coral and HIERAS.

The XOR metric, a refinement of longest-prefix match-ing, supports the above group construction. Assuming closenodes in the IP network have similar IP address prefixes, theXOR distance is small for proximity close nodes.

This hierarchy follows the tree-based model with tree T .Its root corresponds to S. Nodes are leaves of T . Agroup is a non-leaf vertex and contains all nodes from thedescendants. For any pair of groups, they are either node-disjointed, or one group is proper nested into the other. Thetree is typically irregular and imbalanced: groups on thesame layer (even siblings) can be heterogeneous in size andin number of subgroups.

Let mu be the length of the path in T from the root to anode u. Then u belongs to mu groups nested along the path:

Cu1 ⊂ Cu2 ⊂ . . . ⊂ Cumu = S.

That is, u is contained in an assembly of the telescopinggroups. Local links of u is for all v ∈ Cu1, i.e., u knowsall participants of the inner-most IP network (closest neigh-bors). Any Cui , except the root (i = mu), has one ormore siblings in T . Node u keeps at least one neighborfrom every sibling group of Cui for i < mu. Therefore, uknows long-range neighbors on all distance scales. Routingis greedy: forwarding a lookup to the neighbor closest to thekey.

When u joins the system, u takes the routing table of theclosest node to u. First, u notifies all its local neighbors toupdate their routing tables. Second, u asks each long-rangeneighbor v for a random node w in v’s group and then wreplaces v in u’s routing table. It provides the diversity prop-erty when nodes of the same group likely know differentnodes from other groups. Formally, if u1, u2 ∈ C , u1 �= u2,u1 has a long-range link to v1 ∈ C ′ and u2 has a long-rangelink to v2 ∈ C ′ then v1 �= v2 with high probability.

IP network prefixes can reflect proximity inaccurately:nodes with contiguous prefixes might not be adjacent in theIP network. Node capacity does not influence the overlayload balance, and the load of a node directly depends onposition of the inner-most IP network in the hierarchy.

4.2.7 Canon by Ganesan et al. [101]

Canon provides a generic method for accurate reflection inthe overlay topology the real-world hierarchical organiza-tion of nodes in the underlying network. Such node orga-nization is due to the global hierarchy of network domains,which follows the tree-based model. An example was shownin Fig. 3 in Section 3.1.2. The Canon method requires eachnode to know its own position in the global hierarchy, andany two nodes are able to compute their lowest common

ancestor. The hierarchy reflection aims at reducing the rout-ing latency due to lower routing stretch. The overlay pathlength remains the same as in flat DHTs.

Let T be the tree of global hierarchy with M layers andoverlay nodes are leaves. Let mu be the height of node u inT . Non-leaf vertexes are domains consisting of all nodesfrom lower-level domains. The Canon method ensures thatthe nodes in any domain form own DHT-based overlay bythemselves; it is the case of horizontal architecture. Since anode u belongs exactly to one domain on each of mu layers,u participates in mu overlays in total.

The Canon hierarchy construction is bottom up. Givena flat DHT, each set of nodes at a leaf domain forms aDHT-based overlay. At each internal domain, the over-lay, containing all nodes in that domain, is constructedby merging all its children overlays. At each mergingstep, some links are added to routing tables. The top-level(global) overlay is eventually produced; it contains all Nnodes. Links from a lower-level overlay are inherited inhigher-level overlays, following telescoping scheme (10).The number of additional links is moderate. The link addi-tion rules depend on the underlying flat DHT. Some nodesmay have no additional links at all. As a result, the totalnumber of links per node remains comparable with a flatDHT overlay.

Importantly that the higher layer the longer spans itsdomain overlays have. Routing tries short-range links first,see Fig. 9 above. On each layer, a lookup for k reaches theclosest node u in the current domain. Then u is responsi-ble for switching to the next higher layer where the lookupcontinues. It leads to (i) the path locality property when apath between two nodes of the same domain does not leavethis domain and (ii) the path convergence property whenpaths from different nodes of a domain to the same outsidedestination exit the domain through a common node.

When a new node u joins the system, it must know at leastone other existing node in its lowest-level domain (i = 1).This knowledge requires an extra mechanism. Then u joinssequentially the nested overlays on layers i = 1, 2, . . . , mu

using the flat DHT protocol.The Canon method can be applied to transform many

flat DHT designs into their hierarchical versions. In par-ticular, [101] describes such transformations for Chord(Crescendo), Symphony (Cacophony), Pastry/Kademlia(Kandy), and CAN (Can-Can).

4.2.8 Multi-ring hierarchy by Mislove and Druschel [43]

It is a generic method for constructing a hierarchy of DHToverlays (rings). Real-world administrative organization isreflected with a tree of rings. The top layer (root in the tree)consists of the only global ring. All nodes participate in theglobal ring, unless they are behind a firewall or a NAT.


Each overlay may apply own flat DHT. A node is notrequired to participate in every nested ring on the pathfrom the leaf to the root in the tree. Lower-layer rings areconnected to their parent ring using gateway nodes (supern-odes). Although the latter feature is from disjointed archi-tecture, the design is still closer to nested architecture (seealso Section 4.1.8 and Fig. 7). If a node has low capacity orits connectivity in the underlying network is limited then thenode participates only in lower layers rings. Nevertheless,in the extreme case all nodes can become gateways.

Consider the case M = 2. The bottom layer consists ofindependent organizational rings. Each ring has a globallyunique ring ID known to all members of the ring. The globalring has ID of all zeros. Any node must know in advance itsposition in the global hierarchy to join a ring of the givenorganization. Some nodes become members of more thanone ring; they are gateway nodes. A gateway node acts asmultiple virtual overlay nodes, one in each ring, but usesthe same node ID in each ring. Gateway nodes announcethemselves to other members of their rings (which theyparticipate in) by subscribing to an anycast group in each ofthe rings. A group ID is equal to the associated ring ID.

A lookup also carries the ID of the target ring r. If u ∈ r

then the lookup continues in accordance with the overlayprotocol of r. Otherwise, u locates a gateway node to for-ward the lookup to r. If u is a member of the global ring thenu forwards to the anycast group identified with r, and thelookup will be delivered to a gateway node to r. Otherwise,u anycasts the lookup into the global ring group.

For M > 2 a ring ID is a digit string in base b. Each layerappends a digit onto the parent ring ID. A ring can dynam-ically create new rings. In a lookup, if a node u does notbelong to the target ring r, then u, as a member of multiplerings, selects the ring with the longest shared prefix. In caseof multiple rings with the longest prefix, u uses the ring withshortest ID. Routing is two phase. First, bottom-up routingis performed until a ring is found whose ID is a prefix ofthe destination ring. Second, top-down routing continues thelookup towards the destination ring.

The principal maintenance overhead is due to the require-ment that nodes must join multiple rings, and thus requireadditional control traffic for maintaining the routing statein each ring. The design relies on a group anycast mecha-nism. It requires maintenance of spanning trees consistingof the overlay nodes from group member nodes towards theoverlay node that is responsible for the group ID.

4.2.9 Diminished Chord by Karger and Ruhl [44]

The nested architectures above account the global domainhierarchy. Overlay paths become close to underlying net-work paths, improving the routing properties. However, thedesigns do not directly allow a node to efficiently adapt its

participation to its needs. Position of a node in the hier-archy is predetermined, and the node has to provide theresponsibility level in accordance with the position.

In service-oriented applications, a group corresponds toa service that collectively provided by the participatingnodes. For instance, a group is responsible for content fora given topic. A node may individually select which groupsit wants to belongs to. In contrast to domain hierarchies,such a group structure can be dynamic. A straightforwardway is to implement own DHT overlay for each group. Indynamic environments, it becomes too expensive because ofthe maintenance overhead.

Diminished Chord applies the group-based model on topof the Chord DHT for grouping any subset of nodes. Nodesof the same group jointly offer a service without forming itsown overlay. Instead of additional sum state O (|C | log |C |)for a group C with own Chord ring, Diminished Chordintroduces additional O(|C |) state uniformly distributedamong DHT nodes that are not in C themselves.

The architecture is two-layer. The bottom layer is the pri-mary Chord ring where all nodes participate. It provides thebasic lookup operation: lookup(k) returns the node d thatminimizes ρ(d, k) over all nodes. The top layer is for arbi-trary groups. A group C has a group ID in the Chord space.A node participates in none or many groups according toits individual interests. The possibility of group overlappingmakes the design closer to nested architecture.

An additional group lookup service is provided to locatethe node responsible for the key in a specified group C . Inparticular, the group operation lookup(k, C )

d = arg minu∈C

ρ(u, k),

which is a group-restricted version of (1) from Section 2.For each group C a directory tree is embedded into the

primary Chord ring. Vertexes are points in the Chord space,each maps to the closest DHT node, and vertex and node canbe used interchangeably. The root corresponds to the groupID. The tree is binary of height O(log N). All nodes areordered such that a left descendant precedes a right descen-dant in terms of the Chord space distance. Edges are linksfrom a child to its parent. An edge u → v is either a stan-dard Chord finger (v immediately succeeds u + 2i for somei) or a prefinger (v immediately precedes u + 2i for some i).

The key property is that for any node u its right subtreeincludes the node v ∈ C closest to u. The node u eitherkeeps the link to v itself or delegate it to the parent w if v isalso the closest node to w. The property ensures that a pathfrom a leaf to the root goes through a node that knows theclosest node from C for this leaf node. This scheme requiresstoring one additional link at some overlay node for everygroup node, leading to the O(log |C |) state overhead.


To resolve lookup(k, C ) the responsible node d ′ ∈ N

is found in the primary ring (Chord lookup, O(log N)

hops). Then the responsible node d ∈ C is located bytraversing a path in the directory tree from d’ towardsthe root. The traversal takes O(log N) hops. Therefore,the group lookup has the same routing complexity asbasic Chord.

For a given group, the design employs “special” nodes tokeep additional information about the group. They have toserve additional group lookups even not being members ofthe group. The number of groups a node participates in isnot an accurate metric for the node responsibility, and indi-vidual adaptation of the desired participation level becomescomplicated. When the number of groups increases theoverhead expenses are distributed uniformly among all Nnodes, regardless of their heterogeneity.

4.2.10 Cyclone by Artigas et al. [60, 77]

It evolves the Canon method (Section 4.2.7), capturing real-world hierarchical organization of the underlying network.The Cyclone method aims at better load balancing and scal-ability in partitioning the system onto autonomous domainswhen complete knowledge about the global hierarchy isnot available at individual nodes. The reference case isWhirl, the Cyclone version of Chord. In general, the methodallows different P2P protocols for overlays (clusters) in thehierarchy, including hybrid P2P systems.

The n-bit circular node ID space S = [0, 2n) with theXOR distance metric is common for all nodes. The parti-tion scheme follows the tree-based model employing IDsuffixes for cluster identification. A node ID consists of twoparts: a prefix of m − p bits and suffix of p bits. In a nodeID, the suffix identifies the cluster of the node residence,whereas the prefix is an intra-cluster identity. In particular,a p-bit suffix is a hash of a full name of the domain that thecluster represents. Domains may continue subsequent par-tition taking the next leftwards 1 ≤ li ≤ p bits to constructup to 2li branches. Such li -bit strings label subsequentlysubdomains at each layer and identify their clusters.

Lower-layer clusters are merged to form a network on thenext layer, leading to nested architecture as in TOPLUS andCanon with telescoping scheme (10) for routing tables. TheID prefix-suffix structure supports reusability since any twonodes u and v of a cluster have the same l-bit suffix, henceu = v mod 2l . In Whirl, if a cluster C with an l-bit IDcontains a node u then u’s immediate successor v is at least2l away in the ID space, thus the link u → v is reused inany higher-level cluster of C . The total number of links pernode remains comparable with basic Chord.

Similar to Canon, Cyclone uses bottom-up routing to takeadvantage of the network locality. Similarly to HONet andTOPLUS, link augmentation provides faster inter-cluster

routing. A node maintains additional links to its sibling clus-ters at different layers. Many of these links are already inits routing table due to the Cyclone link reusability. If alookup cannot be resolved in the current cluster, the lookupis forwarded directly to the cluster closest to the key.

Inter-cluster links are optional and only a node withenough capacity can additionally maintain them. Suchnodes act as supernodes, turning heterogeneity into anadvantage for global routing. This optional feature, infact, is closer to disjointed architecture, when a supernodeis a representative of the cluster. Inter-cluster links canbe also used to construct link-disjointed Hamilton cyclesconnecting sibling clusters on each layer. This abilitysupports multipath routing with improved security andreliability [77].

The Cyclone architecture has the same typical disad-vantages as Canon. First, the assumption that a node canparticipate in overlays on all layer, especially on higherlayers, is often violated because of the IP connectivityrestrictions. Second, there is no clear distinction of the loadthat different-capacity nodes should take.

4.2.11 G-Tap by Zhang et al. [45, 102]

G-Tap (Grouped Tapestry) supports various group struc-tures, similarly to Diminished Chord. A node may belongto several groups simultaneously. The key feature is group-aware routing when (i) a lookup terminates at the node mostresponsible in a specified group, and (ii) a lookup path isconstrained within a specified group. Group-aware routingcomplements basic Tapestry routing.

The Tapestry space is n-digit integers of base b. Onthe top layer, the global Tapestry overlay consists of all Nnodes. If a node u is a member of a group C then u main-tains additional neighbors from C to form a group Tapestryoverlay.

In the global overlay, any node u = un−1 · · · ui · · · u0

for every i = 0, 1, . . . , n − 1 selects b − 1 long-range neighbors v(j) ∈ N such that v(j) resolves theith digit to j in prefix-matching Tapestry routing, j =0, . . . , ui −1, ui +1, . . . , b−1. G-Tap additionally requiresthat a primary neighbor v(j) is numerically closest toun−1 · · · ui+1jui−1 · · · u0. The diversity of candidates forv(j) is utilized for constructing group overlays on the bot-tom layer. In a group-C overlay, any its node u stores aneighbor vC (j) ∈ C in addition to its primary neighborv(j). Similarly, for the Tapestry leaf set, u maintains up to2m0 primary leaves from N (global overlay) and up to 2m0

leaves from C (group overlay).When a lookup is group-unaware, nodes call basic

Tapestry routing to find to the closest responsible node.When a lookup is constrained within a group C (path-constrained routing), only group-C neighbors are used.


When a lookup is constrained with a destination group C(destination specified routing), then the lookup (i) is routedto a node v ∈ C (group discovering) and (ii) path-constrained routing within C completes the lookup.

For the group discovery, G-Tap uses group rendezvous. Agroup name C is hashed to its key cn−1 · · · c0 in the Tapestryspace. The node c responsible for this key in the global over-lay is C ’s rendezvous, and any node may query c for a nodefrom C . To find a rendezvous, u simply initiates a lookupfor the group key in the global overlay.

A rendezvous can be found faster if a group-C node isnearby. G-Tap distributes the rendezvous load using group-C membership rendezvous (GRM) trees. Such a tree hasbi nodes at levels i = 0, 1, . . . , n − 1, namely the nodesresponsible for keys xn−1 · · · xn−icn−i−1 · · · c0. If i < n−1such a node has b children at level i + 1, namely the nodesresponsible for keys xn−1 · · · xn−ijcn−i−2 · · · c0 with 0 ≤j < b. A path in the group-C tree from a leaf to the rootfollows suffix-matching routing in resolving a lookup forthe group key.

A G-Tap node u maintains a finger set of nodes thatbelong to other groups. When u is queried for a node v ∈ C ,then either u resolves the query if u knows v ∈ C , or uforwards (a tree hop) the query to its parent in the group-C tree. In the former case, u is involved in the group-Ctree. In the latter case, routing in the global overlay is usedeffectively as follows. Node u finds the largest i such thatu is responsible for the key un−1 · · · un−i cn−i−1 · · · c0. Ifi = 0 then u is the root. Otherwise u lookups for the keyun−1 · · · un−i+1cn−i · · · c0 to find the responsible node—theparent. Since the primary neighbors are closest to u amongall available prefix-matching nodes, the number of hops inthe last lookup is small.

When a new node joins, it first joins the global overlayfollowing the Tapestry protocol. Then it joins any existinggroups or creates a new one. To join a group C , a node ufinds a node v ∈ C using the group discovery procedureas above. Then u can find a node w, the current root of thegroup-C tree, and complete joining to C .

If u belongs to several groups, then u maintains multipleTapestry routing tables. The routing state and maintenanceincrease proportionally. In addition, u maintains its fingerset. Since group names are hashed, the probability that u isinvolved in many GRM trees is small and the expected sizeof the finger set is negligible compared with other routinginformation. G-Tap provides O(log N) routing in the globaloverlay and O(log N2) routing within a group of N2 nodes.The group discovery takes O(log(N) hops in the worst case,since the height of a GRM tree is O(log N).

The G-Tap design allows any relationship structurebetween groups. If the structure is a tree-based hierarchythen optimization is possible, and the corresponding designcalled H-Tap (Hierarchical Tapestry) was introduced in

[102]. This hierarchy states that a group of layer i is a unionof non-overlapping groups of layer i − 1. It is similar tothe GTPP architecture from Section 4.1.13. This tree-basedmodel indicates that the H-Tap design closer to disjointedarchitecture.

5 Performance models

Hierarchical architecture allows node and function differ-entiation, specifically important for heterogeneous environ-ments. In this section, we consider the problem of theoptimal number of layers and the population size on eachlayer. On one hand, the more layers the more differentiation.On the other hand, a high hierarchy degree can be expen-sive. We study cost models for this tradeoff: cost of local andtotal states (routing table sizes), routing (path length in over-lay hops), and traffic (lookup and maintenance). The modelslead to qualitative conclusions on benefits of hierarchicalarchitectures in structured P2P overlay networks.

5.1 Local state cost

In hierarchical architecture, nodes can participate in multi-ple layers. It immediately results in the overhead comparedwith the flat DHT case. Recall that (9) from Section 3.3showed if M � N then the local state cost overhead is pro-portional to the number of layers the node belongs to. Fromthis side, an HDHT design should keep small the numberof layers and the number of multi-layer nodes. The simplestcase is two-layer architectures with supernodes.

Consider a disjointed two-layer hierarchical architecturewith N1 = N nodes, forming a flat DHT overlay at thebottom layer. Among them N2 nodes are selected to besupernodes, forming a self-contained overlay at the toplayer. The same ID space is used on both layers. Eachsupernode belongs to two overlays, one for each layer. Sincea supernode maintains two routing tables, the total state costbecomes higher compared with the flat DHT case.

Under these assumptions, the basic model for the opti-mal value of N2 was introduced in [68]. A supernode keepsrouting records of size c2 bytes for each supernode, thus upto fully-meshed topology in the worst case. Also a supern-ode hosts N/N2 regular nodes on the bottom layer (startopology) and a corresponding record consumes c1 bytesper node. Then the local storage at each supernode is upperbounded with

StateCost(N2) = c2N2 + c1N

N2. (11)

The cost function is convex with the minimum 2√

c1c2N

for N2 =√

c1c2N , where N is a free parameter of the total

population size. The model leads to the requirement N2 =


�(√

N) in such HDHT designs as Structured superpeersfrom Section 4.1.3.

A similar state cost model for a loosely structured two-layer hierarchy appeared in [67] with the aim at the optimalvalue for N2 in dependence on the total population size Nand the total resource amount R kept in the system. The costfunction is

StateCost(N2) = N

N2+ R

N2+ c(N2 − 1),

where R is the total number of resources and c is the num-ber of contacts per foreign group at the bottom layer. Notethat this model does not introduce the size coefficients likec1 and c2 in (11), since they do not affect resultant estimateswritten in the big-O notation. The cost is minimized at N2 =c−1

√N + R. The Kelips design uses a fixed c with the opti-

mal tradeoff N2 = �(√

N + R). Assuming that R = O(N)

(i.e., the total number of resources is proportional to N inthe worst case), the design requires N2 = �(

√N) groups

on the top layer.When a conventional flat DHT with O(log N) local state

is used, the state cost at a supernode is

StateCost(N2) = log N2 + log(N/N2) = log N.

Hence, the supernode state cost is the same for hierarchicaland flat designs. However, important performance benefitis introduced for regular nodes. The local state cost perregular node is reduced from log N in the flat DHT caseto log(N/N2) in the HDHT case.

In these local state optimization models the cost ratiobetween supernodes and regular nodes is constant: supern-ode’s cost is twice higher than of a regular node. As weshall see later, higher differentiation is provided by globaloptimization when the total state cost is minimized.

5.2 Total state cost

Let us follow the modeling approach close to [18, 20]. Forsimplicity we consider the continuous case. The total statecost is the sum of local state costs over all nodes. Con-sider the ordered two-layer disjointed architecture with N2

supernodes, each maintains its own non-overlapping over-lay of N/N2 nodes on the bottom layer. Any supernode haslocal state of N/N2 + N2 − 2 entries due to maintenance oftwo routing tables of size N/N2 − 1 (all supernode’s nodes)and N2 − 1 (all other supernodes), respectively. These twodifferent routing tables of a supernode may both containthe same entry. The cost models below do not account thisduplication.

In accordance with [20], assume the extreme case whenevery node is ready to be a supernode (full-redundancy) andevery overlay in the hierarchy has fully-meshed topology.Consequently, any node needs to maintain N/N2 + N2 − 2entries, and the total state cost is

StateCost(N2) = N

(N

N2+ N2 − 2

). (12)

Analyzing its derivate, we conclude that the only minimumis at N2 = √

N . Therefore, the model leads to the sameresult as in minimizing the local state cost.

Consider the assumption with no redundancy. Every nodemaintains a routing table of size N/N2 − 1 to participate inits overlay on the bottom layer. In addition, every supernodemaintains a routing table of size N2 − 1. The total state costbecomes

StateCost(N2) = N2(N2 − 1) + N(N/N2 − 1). (13)

Taking the derivate and equalizing it to zero lead to thefollowing cubic equation respect to N2:

2N32 − N2

2 − N2 = 0.

Based on Cardano’s method one can prove that there existsthe only positive real root N2 = �(N2/3) for large N.Although model (13) states that the total state cost is lessthan for (12), more supernodes are required (N2/3 � N1/2

for large N). The benefit for node differentiation is that thelocal state cost ratio between supernodes and regular nodesis proportional to N1/3.

Let any overlay in the network be DHT-based withlogarithmic-size routing tables. Similarly to (13), the totalstate cost is

StateCost(N2) = N2 log N2 + N log(N/N2). (14)

Taking the derivate and equalizing it to zero, we obtain thefollowing transcendental equation respect to N2:

N2 = N

1 + ln N2.

Since ln N2+1 < N2 for large N2, then N2 ≥ N/N2 and theupper bound N2 = O(

√N) is true for the optimal solution.

If we approximate N2 = N1/2−ε with a small ε > 0 (pos-sibly ε = ε(N)) then the local state cost is (1/2 + ε) log N

and log N for a regular node and a supernode, respectively.The local state cost ratio between supernodes and regularnodes becomes equal to 2/(1 + 2ε).

Now consider the generalization when hierarchical archi-tecture allows overlapping overlays on the bottom layer.Denote y = y(N2) a function such that every layer-1 over-lay has Ny nodes. An additional constraint is 1/N2 ≤y(N2) < 1. The total state cost is

StateCost(N2) = N2(N2 − 1) + N(Ny − 1). (15)


For instance, a generic function (r, α)-family is

y(N2) = (1 + r)Nα−12 for r ≥ 0 and 0 ≤ α < 1,

where r is an overlap factor and α is a size parameter. Ifr = α = 0 then (15) is reduced to (13). If α is close to 1 thenwe consider only large values for N2 to keep y(N2) < 1.

In the fully-meshed topology settings of (15) and withthe (r, α)-family, the optimal value for N2 is a root of theequation

2N32 − N2

2 − (1 + r)(1 − α)N2Nα2 = 0.

Similarly, the logarithmic DHT settings lead to the state cost

StateCost(N2) = N2 log N2 + N log(Ny). (16)

The optimal N2 satisfies the equation

N2 = (1 − α)N

1 + ln N2.

Generalization to M > 2 is based on mathematicalinduction for the number of layers. The case of model (12)was considered in [20]. Given N, the optimal number of lay-ers is M = �(log N). Given N and M, the optimal numberof supernodes on the top layer is NM = �(N1/M).

Given M-layer hierarchy, set N1 = N and NM+1 = 1.The total state cost is given by the following formula.

StateCost =M∑

i=1

Niδi for fixed M, (17)

where Ni is the number of nodes on layer i and δi isthe routing table size in any layer-i overlay. When δi =Ni/Ni+1 − 1 and δi = log(Ni/Ni+1) we obtain general-ization of (13) and (14), respectively. Recall that Ni mustsatisfy constraint (8).

When we move from M-layer hierarchy to (M +1)-layerone, both Ni and δi are changed because of node redistri-bution among layers. The redistribution essentially dependson the applied clustering algorithm: nodes on layer i formgroups and elect their representatives for layer i + 1, e.g.,see analytical clustering frameworks in [20, 32]. Layersi = 1, 2, . . . , M become less populated (Ni is reduced) andoverlays on these layers become smaller (δi is reduced). Itdecreases the total state cost. On the other hand, the new toplayer M + 1 leads to higher state cost for NM nodes. Thecompensation of these two terms determines decrement orincrement of the total cost.

In fully-meshed topology settings, increasing M up tolog N allows reducing the cost to �(N log N). Furtherincreasing does not provide better result than (12). A similarresult can be achieved if a multi-layer hierarchy is con-structed from O(1)-state networks such as rings [103]. Inlogarithmic DHT settings, any flat N-node DHT network

already has the �(N log N) total state cost, and increas-ing M is not essential for the cost improvement. Table 3summarizes the asymptotic cost behavior for large N.

5.3 Routing cost

A routing cost model for the optimal value of N2 was con-sidered in [84, 86] for the disjointed two-layer CAN-basedhierarchy. Let d be the CAN ID space dimension. Startingfrom a regular node, the number of hops needed to find thedestination cluster supernode on the top layer (a CAN over-lay of N2 nodes) is 1+dN

1/d

2 on average. Then d(N/N2)1/d

hops are needed on average to find the responsible node onthe bottom layer (a CAN overlay of N/N2 nodes). In total,the routing cost in overlay hops is

RoutingCost = f (N2) = 1 + dN1/d

2 + d

(N

N2

)1/d

.

The first derivate respect to N2 is

f ′(N2) = − 1

N2

(N

N2

)1/d

+ N1/d−12 .

It is equal to 0 when N2 = √N . This point is a minimum

since the second derivate f ′′(√

N) > 0 for all d ∈ (1, +∞).The optimal value of N2 is independent on d. When N2 =√N the routing cost is minimal for d = ln N2. It is twice

lower than the optimal dflat = ln N = ln N22 = 2 ln N2 in a

flat CAN network. That is, a regular node has twice lowerlocal state. A supernode has the same state as in a flat CANnetwork due to maintenance of two routing tables, each hasd entries (d + d = dflat).

The hierarchical CAN has the optimal routing cost

RoutingCost = 1 + N1/ ln N ln N,

which is one hop greater then the optimal routing cost in aflat CAN network with N nodes and dflat = ln N .

Table 3 Asymptotes of the total state cost in P2P architectures

Architecture Total state cost, big-O

asymptote

Flat logarithmic DHT N log N

Flat fully-meshed N(N − 1)

M = 2, full-redundancy, 2N(√

N − 1)

fully-meshed, see (12)

M = 2, non-redundant, N(

2N1/3 − 1 − N−1/3)

fully-meshed, see (13)

M = 2, logarithmic DHTs,N + √

N

2log N

see (14)

M = �(log N), fully-meshed N log N


Applying the same technique for a flat DHT networkwith O(log N) routing, we yield

RoutingCost = 1 + log N2 + logN

N2= 1 + log N,

and the routing performance again is almost equal forhierarchical and flat designs.

In the Crescendo hierarchical architecture (see the Canonmethod in Section 4.2.7) the similar result holds irrespec-tive of the number of layers. In particular, [101] proved thatin Crescendo the degree of any node (local state cost) andthe number of routing hops between any two nodes (routingcost) are O(log N) with high probability.

5.4 Network traffic cost

Consider the model of two-layer hierarchy for optimizingthe number of supernodes proposed in [41, 73]. The modeldescribes traffic sent/received by nodes on average. Threealternative intra-group connectivity structures are studied:fully-meshed (FuMe), single-connection (SiCo), and DHT,see Section 4.1.6.

The key model parameter is α = N2/N , the fraction ofsupernodes in the system. The case α = 100 % correspondsto a flat overlay (all nodes become supernodes). Traffic costis evaluated in terms of the number of transmitted messagesand is a function of α. The model focuses on α ≤ 25 %, i.e.,groups on the bottom layer consist of 4 or more nodes.

The network cost is formed by lookup traffic (λ), main-tenance traffic (μ), and republish traffic (ρ). Traffic gen-eration for supernodes and regular nodes is differentiated.The total traffic cost is a sum of all traffic costs over allnodes. Obviously, in the SiCo topology, a centralized over-lay network with only one supernode generates the lowestnetwork traffic, because only lookup and ping/pong main-tenance messages are exchanged between the supernodeand its regular nodes. The network traffic cost typicallyincreases when α grows, mostly caused by maintenancetraffic in the top overlay of supernodes. In the FuMe topol-ogy, low values of α correspond to high total traffic becauseof large groups on the bottom layer. In the DHT topology,the extreme cases α → 0 and α = 100 % correspond to thehighest traffic cost, and there exists a tradeoff value for α

in between. The exact analytical expressions is not impor-tant for our discussion; an interested reader can find them in[41, Table 1].

On the other hand, the load a supernode can be highfor some α. The optimal operation point is achieved for α

such that the total network traffic is minimized subject to anon-overload constraint for every supernode. To find theminimal necessary α, a load factor Ls is defined for every

supernode s as the ratio between the traffic bs for s at a spe-cific time, and the bandwidth limit Bs , i.e., Ls(α) = bs/Bs .A supernode is overloaded if its load factor exceeds 100 %.

Another model was analyzed in [69]. It accounts the traf-fic costs in three-layer hierarchy with N1 regular nodes, N2

unit leaders (layer 2), and N3 slice leaders (layer 3), seethe OneHop design in Section 4.1.4. The most bandwidthconsuming nodes are slice leaders. For them the upstreambandwidth is likely to be the dominating and limiting term.Therefore, the model assumption is minimization of theupstream bandwidth utilization for slice leaders.

The cost function depends on two independent variablesN3 and n2 = N2/N3, where n2 is the number of unit leadersper slice. The minimum is achieved for the values

N3 = C3√

N, n2 = C2√

N,

where the constants depend on expected membership rates,messages sizes, and delays of information propagation.Consequently, N2 = �(N), and the number of unit lead-ers becomes comparable with the system population sizeN in the extreme case, i.e., the three-layer architecturedegenerates to a two-layer one.

A detailed comparative study of the two classes of HDHTdesigns was done in [104]. In our terminology these classescorrespond to nested and disjointed architectures, respec-tively. Recall that nested architecture does not aim at nodedifferentiation and the basic state and routing costs are sim-ilar to flat DHTs (in terms of the big-O notation). The studyanalyzed the sum cost of maintenance plus routing. Thecost is derived analytically and then the numerical com-parison was performed varying some parameters. The mainconclusion is that disjointed architecture is better when theprobability of inter-cluster communication is high or theaverage node lifetime is low.

5.5 Benefits of hierarchical architecture

The most common hierarchical architectures are two-layer.They preserve the total routing cost and supernode localstate the same as in flat alternatives. The performanceimprovement is for regular nodes, allowing them to keeplower local state. This benefit is essential for heterogeneousenvironments where P2P designs have to differentiate nodesaccording to their capabilities.

In the general case M ≥ 2, the routing cost is also pre-served the same as in flat DHTs. In disjointed architecture,the local state cost is proportional to the number of layersthe node belongs to. As a result, effective node differentia-tion is possible, which, in particular, essentially reduces themaintenance cost at many nodes located on low layers. It isa major performance gain compared to flat networks.

The same routing cost (in terms of the number of overlayhops) and the same total state cost for flat DHT and HDHT


are expected facts. Flat DHT designs, in spite of their flat-ness, also exploit hierarchical routing schemes to achievethe efficient logarithmic scalability, as [11] explains. Incontrast, the hierarchy offers additional advantages, whichwe discussed in detail for each particular HDHT pro-posal in Section 4. The advantages include adaptation tothe underlying physical network, path locality for faultisolation and security, effective caching and bandwidth uti-lization, hierarchical resource distribution and hierarchicalaccess control. For instance, proximity-based or location-aware groups allow a routing performance speedup due tofaster links to overlay neighbors. Since nodes of higher lay-ers are more stable, this load skewness makes the systemmore reliable.

In nested architecture, the local state is kept similar tothe flat case, irrespective to the number of layers. Sucharchitectures are used for function differentiation instead ofnode differentiation. This approach leads to lower constantsinside the big-O notation for the complexity bounds. Recentpractical confirmation of the performance benefits is pro-vided by MDHT [105]. The MDHT hierarchy is nested:each node participates in its DHT network of the bottomlayer and, typically, in DHT networks on some or all higherlayers. For a world-wide name resolution system with 1015

objects a MDHT system with 106 nodes is required. Thisis only about 1/10th of the 12 million DNS nodes in today’sInternet. Furthermore, 106 MDHT nodes can handle almost1010 users.

6 Conclusion

This survey has discussed hierarchical architectures appli-cable in structured P2P designs. We introduced three basicarrangement models that allow taking the heterogeneity ofnodes into account. The models provide arrangements ofparticipating nodes. Based on an arrangement the nodescan be structured onto groups. Recursive application allowsfurther structuring within groups. It is a crucial step inconstructing the global network hierarchy.

We considered Kleinberg’s models of decentralized net-work hierarchy and showed that for a P2P system its globalgroup structure is arranged along the vertical and horizontaldimensions. We derived three conceptual models of hier-archy to cover cluster-based, tree-based, and group-basedconstructions. All they appear in various forms in existinghierarchical P2P designs.

A hierarchical architecture defines how node groups areformed in the P2P system and then inter-connected alongthe vertical and horizontal dimensions. We described andclarified the existing classification of hierarchical P2P archi-tectures: vertical, horizontal, and their subclasses. Basedon the conceptual models we analyzed these architectures

and formulated certain design principles that affect designchoices in a hierarchical P2P architecture. The principlesconcretize the means of hierarchical decomposition of P2Pnodes onto inter-connected groups.

We overviewed more than 20 hierarchical P2P propos-als clarifying how our design principles are applied inparticular cases and what concrete benefits the hierarchyprovides. We selected the most representative designs basedon their popularity and on our personal point of view.Most of the designs are for HDHTs, since they repre-sent ‘pure structured’ P2P systems. The overview showsvarious design solutions to hierarchical architectures. Itcan be considered as state-of-the-art summarized usingour framework.

Finally, we introduce analytical cost models to analyzesystem tradeoffs and performance in multi-layer hierar-chical P2P architectures. Although the difference betweenflat and hierarchical architectures is not essential or eventhe same in the terms of asymptotic big-O notation, themajor gain is possibility of node and function differentia-tion. In turn, it offers such advantages as adaptation to theunderlying physical network, path locality for fault isolationand security, effective caching and bandwidth utilization,hierarchical resource distribution and hierarchical accesscontrol.

References

1. Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H(2001) Chord: a scalable peer-to-peer lookup service for internetapplications. In: Proc. of ACM SIGCOMM’01. ACM Press, SanDiego, pp 149–160

2. Ratnasamy S, Handley PFM, Karp R, Shenker S (2001) Ascalable content-addressable network. In: Proc. of ACM SIG-COMM’01. ACM Press, San Diego, pp 161–172

3. Rowstron A, Druschel P (2001) Pastry: scalable, distributedobject location and routing for large-scale peer-to-peer systems.In: Middleware’01: Proc. of IFIP/ACM Int’l Conf. on distributedsystems platforms. Volume 2218 of lecture notes in computerscience. Springer-Verlag, pp 329–350

4. Androutsellis-Theotokis S, Spinellis D (2004) A survey of peer-to-peer content distribution technologies. ACM Comput Surv36(4):335–371

5. Lua EK, Crowcroft J, Pias M, Sharma R, Lim S (2005) A surveyand comparison of peer-to-peer overlay network schemes. IEEECommun Surv Tutorials 7(2):72–93

6. Marti S, Garcia-Molina H (2006) Taxonomy of trust: categoriz-ing P2P reputation systems. Comput Netw 50(4):472–484

7. Risson J, Moors T (2006) Survey of research towards robust peer-to-peer networks: search methods. Comput Netw 50(17):3485–3521

8. Meshkova E, Riihijarvi J, Petrova M, Mahonen P (2008) A sur-vey on resource discovery mechanisms, peer-to-peer and servicediscovery frameworks. Comput Netw 52(11):2097–2128

9. Karrels DR, Peterson GL, Mullins BE (2009) Structured P2Ptechnologies for distributed command and control. Peer-to-PeerNetw Appl 2:311–333


10. Urdaneta G, Pierre G, Steen MV (2011) A survey of DHTsecurity techniques. ACM Comput Surv 43(2):8:1–8:49

11. Korzun D, Gurtov A (2011) Survey on hierarchical routingschemes in “flat” distributed hash tables. Peer-to-Peer Netw Appl4:346–375

12. Passarella A (2012) A survey on content-centric technologies forthe current internet: CDN and P2P solutions. Comput Commun35(1):1–32

13. Gotz S, Rieche S, Wehrle K (2005) Selected DHT algorithms.In: Steinmetz R, Wehrle K (eds) Peer-to-Peer systems and appli-cations. Volume 3485 of lecture notes in computer science.Springer, Berlin / Heidelberg, pp 95–117

14. Birman KP (2005) Reliable distributed systems: technologies,web services, and applications. Springer-Verlag, New York

15. Buford JF, Yu H, Lua EK (2009) P2P networking and applica-tions. Elsevier

16. Vu QH, Lupu M, Ooi BC (2010) Peer-to-Peer computing: prin-ciples and applications. Springer

17. Korzun D, Gurtov A (2013) Structured peer-to-peer systems:fundamentals of hierarchical organization, routing, scaling, andsecurity. Springer

18. Kleinrock L, Kamoun F (1977) Hierarchical routing for large net-works: performance evaluation and optimization. Comput Netw1:155–174

19. Chiang M, Low SH, Calderbank AR, Doyle JC (2007) Lay-ering as optimization decomposition: a mathematical theory ofnetwork architectures. Proc IEEE 95:255–312

20. Lian J, Naik K, Agnew GB (2007) A framework for evaluatingthe performance of cluster algorithms for hierarchical networks.IEEE/ACM Trans Netw 15:1478–1489

21. Lloret J, Palau C, Boronat F, Tomas J (2008) Improving net-works using group-based topologies. Comput Commun 31:3438–3450

22. Xu J, Kumar A, Yu X (2004) On the fundamental tradeoffsbetween routing table size and network diameter in peer-to-peernetworks. IEEE J Sel Areas Commun 22(1):151–163

23. Loguinov D, Kumar A, Rai V, Ganesh S (2005) Graph-theoreticanalysis of structured peer-to-peer systems: routing distances andfault resilience. IEEE/ACM Trans Netw 13(5):1107–1120

24. Gummadi K, Gummadi R, Gribble S, Ratnasamy S, ShenkerS, Stoica I (2003) The impact of DHT routing geometry onresilience and proximity. In: Proc. of ACM SIGCOMM’03. ACMPress, pp 381–394

25. Castro M, Drushel P, Ganesh A, Rowstron A, Wallach DS(2002) Secure routing for structured peer-to-peer overlay net-works. In: Proc. 5th USENIX Symp. on operating system designand implementation (OSDI 2002). ACM Press, Boston, pp 299–314

26. Kleinberg JM (2006) Complex networks and decentralizedsearch algorithms. In: Proc. Int’l congress of mathematicians(ICM 2006). European Mathematical Society

27. Keshav S (2006) Efficient and decentralized computation ofapproximate global state. SIGCOMM Comput Commun Rev36(1):69–74

28. Li J, Stribling J, Morris R, Kaashoek MF, Gil TM (2005) Aperformance vs. cost framework for evaluating DHT designtradeoffs under churn. In: Proc. of IEEE INFOCOM’05. Volume1., IEEE, pp 225–236

29. Krishnamurthy B, Wang J, Xie Y (2001) Early measurementsof a cluster-based architecture for P2P systems. In: Proc. 1stACM SIGCOMM workshop on internet measurement (IMW’01), ACM, pp 105–109

30. Ratnasamy S, Handley M, Karp R, Shenker S (2002)Topologically-aware overlay construction and server selection.In: Proc. of IEEE INFOCOM’02

31. Xu Z, Mahalingam M, Karlsson M (2003) Turning heterogene-ity into an advantage in overlay routing. In: Proc. of IEEEINFOCOM’03, pp 1499–1509

32. Sanchez-Artigas M, Garcıa Lopez P (2010) Echo: a peer-to-peerclustering framework for improving communication in DHTs. JParallel Distrib Comput 70:126–143

33. Dabek F, Kaashoek MF, Karger D, Morris R, Stoica I (2001)Wide-area cooperative storage with CFS. In: Proc. 18th ACMSymp. Operating systems principles (SOSP ’01). ACM Press,pp 202–215

34. Rufino J, Alves A, Exposto J, Pina A (2004) A cluster orientedmodel for dynamically balanced DHTs. In: IPDPS’04: Proc.18th Int’l Symp. on parallel and distributed processing. IEEEComputer Society

35. Surana S, Godfrey B, Lakshminarayanan K, Karp R, Stoica I(2006) Load balancing in dynamic structured peer-to-peer sys-tems. Perform Eval 63(3):217–240

36. Tang C, Xu Z, Dwarkadas S (2003) Peer-to-peer informationretrieval using self-organizing semantic overlay networks. In:Proc. of ACM SIGCOMM’03. ACM Press, pp 175–186

37. Wan Y, Asaka T, Takahashi T (2008) A hybrid P2P overlaynetwork for non-strictly hierarchically categorized contents. In:Proc. 8th IEEE Int’l Symp. on Cluster Computing and the Grid(CCGRID ’08). IEEE Computer Society, pp 41–48

38. Xu M, Zhou S, Guan J (2011) A new and effective hierarchicaloverlay structure for Peer-to-Peer networks. Comput Commun34(7):862–874

39. Asiki A, Tsoumakos D, Koziris N (2010) Distributing andsearching concept hierarchies: an adaptive DHT-based system.Clust Comput 13(3):257–276

40. Yang B, Garcia-Molina H (2003) Designing a super-peer net-work. In: Proc. 19th Int’l Conf. on data engineering (ICDE’03),pp 49–60

41. Zoels S, Despotovic Z, Kellerer W (2008) On hierarchical DHTsystems—an analytical approach for optimal designs. ComputCommun 31(3):576–590

42. Ou Z, Harjula E, Koskela T, Ylianttila M (2010) GTPP: generaltruncated pyramid peer-to-peer architecture over structured DHTnetworks. Mob Netw Appl 15:729–749

43. Mislove A, Druschel P (2004) Providing administrative con-trol and autonomy in structured peer-to-peer overlays. In: IPTPS’04: Proc. 3rd Int’l workshop on peer-to-peer systems. Volume3279 of lecture notes in computer science. Springer, pp 162–172

44. Karger DR, Ruhl M (2004) Diminished chord: a protocol forheterogeneous subgroup formation in peer-to-peer networks. In:IPTPS ’04: Proc. 3rd Int’l workshop on peer-to-peer systems.Volume 3279 of lecture notes in computer science. Springer,pp 288–297

45. Zhang Y, Li D, Chen L, Lu X (2008) Flexible routing in groupedDHTs. In: Proc. 8th IEEE Int’l Conf. on peer-to-peer computing(P2P ’08). IEEE Computer Society, pp 109–118

46. Zhao BY, Huang L, Stribling J, Rhea SC, Joseph AD,Kubiatowicz JD (2004) Tapestry: a resilient global-scale over-lay for service deployment. IEEE J Sel Areas Commun 22(1):41–53

47. Maymounkov P, Mazieres D (2002) Kademlia: a peer-to-peerinformation system based on the XOR metric. In: IPTPS’02: Proc. 1st Int’l workshop on peer-to-peer systems. Volume2429 of lecture notes in computer science. Springer, pp 53–65

48. Malkhi D, Naor M, Ratajczak D (2002) Viceroy: a scalableand dynamic emulation of the butterfly. In: Proc. 21st AnnualSymp. on principles of distributed computing (PODC ’02). ACMPress, pp 183–192


49. Manku GS, Bawa M, Raghavan P (2003) Symphony: distributedhashing in a small world. In: Proc. 4th USENIX Symp. on inter-net technologies and systems (USITS’03). USENIX Association,pp 127–140

50. Gai AT, Viennot L (2004) Broose: a practical distributedhashtable based on the De-Bruijn topology. In: Proc. IEEE 4thInt’l Conf. on peer-to-peer computing (P2P ’04). IEEE ComputerSociety, pp 167–164

51. Karger D, Lehman E, Leighton T, Panigrahy R, Levine M,Lewin D (1997) Consistent hashing and random trees: distributedcaching protocols for relieving hot spots on the world wide web.In: Proc. 29th Annual ACM Symp. on theory of computing(STOC ’97), ACM, pp 654–663

52. Kleinberg JM (2000) The small-world phenomenon: an algo-rithm perspective. In: Proc. 32nd Annual ACM Symp. theory ofcomputing (STOC ’00). ACM Press, pp 163–170

53. Kempe D, Kleinberg J, Demers A (2004) Spatial gossip andresource location protocols. J ACM 51:943–967

54. Korzun D, Nechaev B, Gurtov A (2009) Cyclic routing: gen-eralizing lookahead in peer-to-peer networks. In: Proc. 7thIEEE/ACS Int’l Conf. on computer systems and applications(AICCSA2009). IEEE Computer Society, pp 697–704

55. Duchon P, Hanusse N, Lebhar E, Schabanel N (2006) Towardssmall world emergence. In: Proc. 18th Annual ACM Symp. onparallelism in algorithms and architectures (SPAA ’06), ACM,pp 225–232

56. Viswanath B, Post A, Gummadi KP, Mislove A (2010) Ananalysis of social network-based sybil defenses. In: Proc.ACM SIGCOMM 2010 Conf. applications, technologies, archi-tectures, and protocols for computer communication, ACM,pp 363–374

57. Risson J, Qazi S, Moors T, Harwood A (2006) A depend-able global location service using rendezvous on hierar-chic distributed hash tables. In: Proc. Int’l Conf. network-ing, Int’l Conf. systems and Int’l Conf. mobile commu-nications and learning technologies (ICN/ICONS/MCL ’06).IEEE Computer Society

58. Singh A, Liu L (2004) A hybrid topology architecture for P2Psystems. In: Proc. 13th Int’l Conf. on computer communicationsand networks (ICCCN 2004), pp 475–480

59. Tian R, Xiong Y, Zhang Q, Li B, Zhao BY, Li X (2005) Hybridoverlay structure based on random walks. In: IPTPS ’05: Proc.4th Int’l workshop on peer-to-peer systems. Volume 3640 oflecture notes in computer science. Springer, pp 152–162

60. Artigas MS, Lopez PG, Ahullo JP, Skarmeta AFG (2005)Cyclone: a novel design schema for hierarchical DHTs. In: Proc.5th IEEE Int’l Conf. on peer-to-peer computing (P2P ’05). IEEEComputer Society, pp 49–56

61. Hu J, Li M, Zheng W, Wang D, Ning N, Dong H (2004) Smart-boa: constructing P2P overlay network in the heterogeneousInternet using irregular routing tables. In: IPTPS ’04: Proc. 3rdInt’l workshop on peer-to-peer systems. Volume 3279 of lecturenotes in computer science. Springer, pp 278–287

62. Leong B, Liskov B, Demaine E (2004) Epichord: parallelizingthe Chord lookup algorithm with reactive routing state man-agement. In: Proc. 12th Int’l Conf. on networks (ICON 2004),pp 270–276

63. Li J, Stribling J, Morris R, Kaashoek MF (2005) Bandwidth-efficient management of DHT routing tables. In: Proc. of the 2ndsymposium on networked systems design and implementation(NSDI ’05), pp 99–114

64. Garces-Erice L, Biersack E, Felber PA, Ross KW, Urvoy-KellerG (2003) Hierarchical peer-to-peer systems. In: Proc. ACM/IFIPInt’l Conf. on parallel and distributed computing (Euro-Par2003), pp 643–657

65. Li X, Wu J (2004) Hierarchical P2P systems in a small world. In:Proc. 2nd Latin American and Caribbean Conf. for engineeringand technology (LACCEI’2004)

66. Lee JW, Schulzrinne H, Kellerer W, Despotovic Z (2009)mDHT: multicast-augmented DHT architecture for high avail-ability and immunity to churn. In: Proc. 6th IEEE Conf. con-sumer communications and networking conference (CCNC’09),IEEE, pp 760–764

67. Gupta I, Birman K, Linga P, Demers A, van Renesse R (2003)Kelips: building an efficient and stable P2P DHT throughincreased memory and background overhead. In: IPTPS ’03:Proc. 2nd Int’l workshop on peer-to-peer systems. Volume 2735of lecture notes in computer science. Springer, pp 160–169

68. Mizrak AT, Cheng Y, Kumar V, Savage S (2003) Structuredsuperpeers: leveraging heterogeneity to provide constant-timelookup. In: Proc. 3rd IEEE workshop on internet applications(WIAPP 2003), pp 104–111

69. Gupta A, Liskov B, Rodrigues R (2004) Efficient routing forpeer-to-peer overlays. In: Proc. 1st Symp. on networked systemsdesign and implementation (NSDI ’04)

70. Fonseca P, Rodrigues R, Gupta A, Liskov B (2009) Full-information lookups for peer-to-peer overlays. IEEE Trans Par-allel Distrib Syst 20(9):1339–1351

71. Monnerat LR, Amorim CL (2009) Peer-to-peer single hop dis-tributed hash tables. In: Proc. of IEEE Globecom’09

72. Risson J, Harwood A, Moors T (2006) Stable high-capacity one-hop distributed hash tables. In: ISCC ’06: Proc. 11th IEEE Symp.on computers and communications. IEEE Computer Society,pp 687–694

73. Zoels S, Despotovic Z, Kellerer W (2006) Cost-based anal-ysis of hierarchical DHT design. In: Proc. 6th IEEE Int’lConf. on peer-to-peer computing (P2P ’06). IEEE Com-puter Society, pp 233–239

74. Zols S, Hofstatter Q, Despotovic Z, Kellerer W (2009) Achievingand maintaining cost-optimal operation of a hierarchical DHTsystem. In: Proc. 2009 IEEE Int’l Conf. on communications(ICC’09). IEEE Press, pp 2194–2199

75. Zoels S, Despotovic Z, Kellerer W (2007) Load balancingin a hierarchical DHT-based P2P system. In: Proc. 2007Int’l Conf. collaborative computing: networking, applicationsand worksharing (COLCOM ’07). IEEE Computer Society,pp 353–361

76. Ren XJ, Gu ZM (2007) SA-Chord: a novel P2P systembased on self-adaptive joining. In: Proc. 6th Int’l Conf.grid and cooperative computing (GCC 2007). IEEE Com-puter Society, pp 75–81

77. Artigas MS, Lopez PG, Skarmeta AFG (2005) A novel method-ology for constructing secure multipath overlays. IEEE InternetComput 9(6):50–57

78. Zoels S, Eichhorn M, Tarlano A, Kellerer W (2006)Content-based hierarchies in DHT-based peer-to-peer systems.In: Proc. Int’l Symp. applications and the internet work-shops (SAINT Workshops 2006). IEEE Computer Society,pp 105–108

79. Zhang XM, Wang YJ, Li Z (2007) Research of routing algorithmin hierarchy-adaptive P2P systems. In: Proc. 5th Int’l Symp. par-allel and distributed processing and applications (ISPA 2007).Volume 4742 of lecture notes in computer science. Springer,pp 728–739

80. Xu Z, Zhang Z (2002) Building low-maintenance expresswaysfor P2P systems. Techical Report HPL-2002-41, HP Labs, PaloAlto

81. Zhang Z, Shi SM, Zhu J (2002) Self-balanced P2P expressway:when Marxism meets Confucian. Techical Report MSR-TR-2002-72, Microsoft Research Asia


82. Joung YJ, Wang JC (2007) Chord2: a two-layer chord for reduc-ing maintenance overhead via heterogeneity. Comput Commun51(3):712–731

83. Tanta-ngai H, McAllister M (2006) A peer-to-peer expresswayover chord. Math Comput Model 44(7–8):659–677

84. Martinez-Yelmo I, Cuevas R, Guerrero C, Mauthe A (2008)Routing performance in a hierarchical DHT-based overlay net-work. In: Proc. 16th Euromicro Conf. parallel, distributed andnetwork-based processing (PDP 2008). IEEE Computer Society,pp 508–515

85. Martinez-Yelmo I, Bikfalvi A, Guerrero C, Rumin RC, MautheA (2008) Enabling global multimedia distributed services basedon hierarchical DHT overlay networks. Int J Internet ProtocolTechnol (IJIPT) 3(4):234–244

86. Martinez-Yelmo I, Guerrero C, Rumın RC, Mauthe A (2009) Ahierarchical P2PSIP architecture to support skype-like services.In: Proc. 17th Euromicro Int’l Conf. parallel, distributed andnetwork-based processing (PDP 2009). IEEE Computer Society,pp 316–322

87. Min SH, Holliday J, Cho DS (2006) Optimal super-peer selec-tion for large-scale P2P system. In: Proc. 2006 Int’l Conf. hybridinformation technology (ICHIT ’06). IEEE Computer Society,pp 588–593

88. Le L, Kuo GS (2007) Hierarchical and breathing peer-to-peer SIPsystem. In: Proc. IEEE Int’l Conf. communications (ICC 2007).IEEE, pp 1887–1892

89. Heristyo A, Masuyama H, Kasahara S, Takahashi Y (2009)User-search time analysis for hierarchical peer-to-peer over-lay networks with time-dependent user-population process. In:Proc. 4th Int’l Conf. queueing theory and network applications(QTNA’09). ACM, pp 5:1–5:4

90. Guisheng Y, Jie S, Xianghui W (2008) Hierarchical small-world P2P networks. In: Proc. Int’l Conf. internet comput-ing in science and engineering (ICICSE ’08). IEEE Com-puter Society, pp 452–458

91. Mesaros VA, Carton B, Roy PV (2003) S-Chord: using symme-try to improve lookup efficiency in chord. In: Proc. Int’l Conf.parallel and distributed processing techniques and applications(PDPTA’03)

92. Ganesan P, Manku GS (2004) Optimal routing in chord. In: Proc.15th annual ACM-SIAM Symp. Discrete algorithms (SODA’04). Society for Industrial and Applied Mathematics, pp 176–185

93. Zhao BY, Duan Y, Huang L, Joseph AD, Kubiatowicz JD (2002)Brocade: landmark routing on overlay networks. In: IPTPS ’02:

Proc. 1st Int’l workshop on peer-to-peer systems. Volume 2429of lecture notes in computer science. Springer, pp 34–44

94. Xu Z, Tang C, Zhang Z (2003) Building topology-aware overlaysusing global soft-state. In: Proc. 23rd Int’l Conf. distributed com-puting systems (ICDCS’03). IEEE Computer Society, pp 500–508

95. Zhu Y, Wang H, Hu Y (2003) A super-peer based lookupin structured peer-to-peer systems. In: Proc. ISCA 16th Int’lConf. parallel and distributed computing systems (PDCS 2003),pp 465–470

96. Freedman MJ, Mazieres D (2003) Sloppy hashing and self-organizing clusters. In: IPTPS ’03: Proc. 2nd Int’l workshop onpeer-to-peer systems. Volume 2735 of lecture notes in computerscience. Springer, pp 45–55

97. Xu Z, Min R, Hu Y (2003) HIERAS: a DHT based hierarchi-cal P2P routing algorithm. In: Proc. 32nd Int’l Conf. parallelprocessing (ICPP 2003). IEEE Computer Society, pp 187–194

98. Park K, Pack S, Kwon T (2008) Proximity based peer-to-peeroverlay networks (P3ON) with load distribution. In: Proc. Int’lConf. information networking (ICOIN 2007). Towards ubiqui-tous networking and services. Revised selected papers. Springer-Verlag, pp 234–243

99. Xu J, Jin H (2009) A structured P2P network based on the smallworld phenomenon. J Supercomput 48:264–285

100. Garces-Erice L, Ross KW, Biersack EW, Felber P, Urvoy-Keller G (2003) Topology-centric look-up service. In: Proc. 5thInt’l Conf. group communications and charges (NGC 2003),workshop on networked group communication. Volume 2816 oflecture notes in computer science. Springer, pp 58–69

101. Ganesan P, Gummadi K, Garcia-Molina H (2004) Canon in Gmajor: designing DHTs with hierarchical structure. In: Proc. 24thInt’l Conf. distributed computing systems (ICDCS ’04). IEEEComputer Society, pp 263–272

102. Zhang Y, Chen L, Lu X, Li D (2010) Enabling routing control ina DHT. IEEE J Sel Areas Commun 28(1):28–38

103. Bermond JC, Choplin S, Prennes S (2003) Hierarchical ringnetwork design. Theory Comput Syst 36:663–682

104. Artigas MS, Lopez PG, Skarmeta AF (2007) A comparativestudy of hierarchical DHT systems. In: Proc. 32nd IEEE Conf.on local computer networks (LCN ’07). IEEE Computer Society,pp 325–333

105. D’Ambrosio M, Dannewitz C, Karl H, Vercellone V (2011)MDHT: a hierarchical name resolution service for information-centric networks. In: Proc. ACM SIGCOMM workshop onInformation-centric networking (ICN ’11). ACM, pp 7–12


Dmitry Korzun received hisB.Sc. (1997) and M.Sc (1999)degrees in Applied Mathe-matics and Computer Sciencefrom the Petrozavodsk StateUniversity (Russia). Hereceived a Ph.D. degree inPhysics and Mathematicsfrom the St.-Petersburg StateUniversity (Russia) in 2002.He is an Associate Professorat the Department of Com-puter Science of PetrozavodskState University PetrSU,

Russia (since 2003) and a part-time Research Scientist at the HelsinkiInstitute for Information Technology HIIT, Aalto University, Finland(since 2005). Dmitry Korzun serves on TPC and editorial boardsof a number of international conferences and journals. His researchinterests include analysis and evaluation of distributed systems, dis-crete modeling, ubiquitous computing in smart spaces, Internet ofThings, software engineering, algorithm design and complexity, linearDiophantine analysis and its applications, theory of formal languagesand parsing. His educational activity started in 1997 at the Facultyof Mathematics of PetrSU. Since that time he has taught more than 20

study courses on hot topics in Computer Science, Applied Mathemat-ics, Information and Communication Technology. He is an author andco-author of more than 100 research and educational publications.

Andrei Gurtov receivedhis M.Sc (2000) and Ph.D.(2004) degrees in ComputerScience from the Universityof Helsinki, Finland. He ispresently a visiting scholarat the International Com-puter Science Institute (ICSI),Berkeley. He was a Professorat University of Oulu in thearea of Wireless Internet in2010-12. He is also a Prin-cipal Scientist leading theNetworking Research group

at the Helsinki Institute for Information Technology HIIT. Previously,he worked at TeliaSonera, Ericsson NomadicLab, and University ofHelsinki. Dr. Gurtov is a co-author of over 130 publications includingtwo books, research papers, patents, and IETF RFCs. He is a seniormember of IEEE.

Documents

Hierarchical architectures in structured peer-to-peer overlay networks