6

Click here to load reader

Effective message routing in unstructured peer-to-peer overlays

  • Upload
    m

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Effective message routing in unstructured peer-to-peer overlays

Effective message routing in unstructured peer-to-peeroverlays

M. Ciglari$c

Abstract: There is a lack of efficiency in flooding-based unstructured peer-to-peer overlays, whereloosely coupled nodes require high local autonomy. Two routing improvements are comparedbased on answer caching, where the cached metadata facilitates content-based routing of queries.Since peer nodes keep joining and leaving the overlay, a mechanism to keep the metadata valid isanalysed. The problem area is reviewed, an overlay network model described, and related messagerouting issues and the simulation environment explained. Simulation results confirm expectationsabout the traffic reduction, while the user experience does not deteriorate.

1 Introduction

By Kung’s definition [1], content networks are overlaynetworks, lying on top of IP networks and implementingthe content routing of messages. A peer-to-peer network[2, 3] is a case of a content network where all the nodes (endcomputers) have equal roles. Messages are routed amongthem on the basis of message contents, not the destinationaddresses as on the network layer.

In unstructured peer-to-peer networks the nodes haveequal functions and none carry special responsibilities. Intwo-layered systems some nodes are superior and act asproxies among their clients. Ordinary nodes only connect toone or a small number of the supernodes, and theirsupernodes communicate with the rest of the overlay ontheir behalf. The subnetwork of super nodes can also beviewed as an unstructured peer-to-peer network, while theregular nodes have less functionality and can only sendqueries to ‘their’ supernodes. Basic mechanism for querymessage routing is usually based on flooding, which isrobust and reliable but also exhibits high redundancy andcreates very high network load (as reported for example inthe Gnutella file-sharing network [4, 5]).

Very efficient mechanisms for message routing in peer-to-peer networks have been proposed, but the most efficientones place high demands on the peer nodes. Strict rules onthe content placement and establishing the connectionsamong the nodes affect the local autonomy of the peers. Forexample, the end user may not choose the files to be storedon his disc. The rules also dictate the overlay structure. Suchsystems fall into the category of structured peer-to-peersystems and today most of them are in the phase ofexperimental prototypes. In contrast, in currently mostwidely adopted peer-to-peer file-sharing overlays the endusers require high autonomy and total control over contentplacement, bandwidth consumption and peer connections,

while they are usually unwilling to take part as asubordinate participant in some kind of structured system.Therefore the efficient mechanisms are not applicable there.

This is why the research of unstructured network overlaysmay be valuable although some researchers [3] see them asprevious-generation peer-to-peer systems and as suchunimportant for further exploration. It is believed that theresults from our and related research can be immediatelyapplied and their efficiency can be verified in existingunstructured as well as two-layered peer-to-peer systems.Grid systems could also make use of suggested mechanismsfor finding data, files or other contents, especially in data-oriented applications.

The purpose of this paper is threefold: to point out theimportance of further routing research in unstructuredoverlays with highly autonomous nodes, to suggestimprovements of the basic flooding mechanism and toanalyse the behaviour and prove the effectiveness of thesuggested improvements by means of a simulation. Themain contributions are as follows:

� The model of an unstructured peer-to-peer system [6] isexpanded and validated

� The suggested improvements from our previous work [6]are formalised and expanded with metadata maintenancemechanism, they are also analysed and evaluated bysimulations so that the system behaviour can be predictedand potential benefits in a live environment can beunderstood

� The suggested improvements are straightforward enoughto be immediately included into existing peer-to-peerrouting mechanisms; they are also compatible withexisting mechanisms so that they can be used in parallel(some queries are flooded, others are routed; some nodescache answers while the others do not).

It is shown that, as a result of the fact that queries forcertain files are repeated more often than others, the totalnumber of message hops in the overlay can be significantlyreduced without degradation of the end user’s experience ofthe system performance. An efficient metadata maintenancemechanism is suggested and the impact of too long and tooshort timeout values on overall traffic and delays evaluated.The paper will be of interest to designers of routingprotocols in peer-to-peer and similar architectural system

The author is with Faculty of Computer and Information Science, ComputerCommunications Laboratory, University of Ljubljana, Tr$za$ska 25, 1000Ljubljana, Slovenia

E-mail: [email protected]

r IEE, 2005

IEE Proceedings online no. 20045221

doi:10.1049/ip-com:20045221

Paper first received 28th September 2004 and in final revised form 1st March2005

IEE Proc.-Commun., Vol. 152, No. 5, October 2005 673

Page 2: Effective message routing in unstructured peer-to-peer overlays

forms (grid, cluster), as well as to those who take part inpeer-to-peer networks or would just like to understand thegeneral rules and principles within such systems.

2 Model

An internet-based peer-to-peer system is a highly dynamicstructure and in the simplified model it is represented asa collection of nodes: peer processes with open communi-cation links, connecting them in the form of overlaynetwork with the properties of small world and powerlaws, as described in [7, 8]. The term topology always refersto the overlay network topology in this paper. The termnetwork refers to the peer-to-peer overlay network, notthe underlying internet. The network dynamicity ismodelled by one node departure and one node join eachTd time-steps.

Each node maintains a collection of files accessible toother nodes. Mi ¼ fm1; m2; . . .g denotes a set of metadataabout the files at node i. Each file Fi is described by itsmetadata mi, a set of metadata elements mi ¼ fk1; k2; . . .g:name, type, size and a few keywords. Query Qi is a message,identified by a globally unique identifier and containing asubset of available metadata elements Qi ¼ fkm; kn; . . .g.The available files are not equally popular. A measure of filepopularity qi [9] is defined as percent of queries looking forfile Fi (i.e. matching its metadata according to somematching function). The file popularity follows Zipfdistribution (shown at [10] and also confirmed by our ownexperiments). The term repetitive query refers to subsequentqueries asking about the same file, i.e. to the queries with apositive match to the same file. It does not necessary meanthat the repetitive queries contain same keywords or othermetadata elements; it only means that the metadata in thequery (the search condition) match the metadata of thesame file according to the matching function.

The model deliberately ignores the process of matchingthe query contents with file properties and the contents ofthe matching function. It is simply assumed that there existsa mechanism, giving explicit answers about whether a file Fi

matches a query Qj or not: a matching function m isdefined, returning a boolean value for each pair (F, Q),where F represents a file and Q represents a valid query. Afile F matches the query Q when m(F, Q) returns True. Thejth answer to Qi is Aij ¼ ðmp; hÞ, where mðFp;QiÞ ¼ Trueand h is the time from forwarding Qi to the receipt of Aij.For example, metadata elements of a query should beexactly the same as metadata elements of a file: Q ¼ mp.Other examples are Q � mp or mp � Q or any logicalexpression over Q and mp.

Each node is able to generate query messages about thefiles, receive and forward query messages from other nodes,generate answers, and receive and forward answer messagesfrom other nodes. Each node is able to generate andstore metadata on received messages. Messages can onlybe passed to one or a subset of neighbours, chosen by arouting mechanism. Answers return to the query originatorover the same path. When a matching file is found, the nodegenerates answer message containing complete file metadata.

Since users obtain most of their files after querying forthem, the more popular files also have more replicas inthe system. File F is more popular than file G when thereare more queries with mðF ;QÞ ¼ True than those withmðG;QÞ ¼ True within a certain period of time. Thepopularity of the file i is denoted by pi the number ofquery matches within a period of time normalised over allfiles, so that the sum over all pi gives 1. Thus the popularfiles have more queries looking for them.

3 Message routing principles

Flooding is a fundamental query routing mechanism,assuming that the more nodes a query reaches, the morelikely it is to find an answer. On receiving a query each nodepasses it on to the rest of the neighbours. A node may receivethe same query from several neighbours, but only forwards itonce. Queries have a limited lifetime of TTL (time-to-live)hops. Answers are always passed back to the query originover the same chain of nodes that delivered the query.

Most of the known improvements of flooding attempt toreach the same set of nodes with much less query hops whenforwarding the query to not all of the node’s neighbours. Abenefit of flooding is its robustness in a dynamic environ-ment, and the weakness is its forgetfulness. A node mayflood a query and receive an answer, but subsequentrepetitive queries are still being flooded. The same large setof nodes is visited although it is obvious from the previousanswer that only a few of them would be enough.

In [6] it was suggested that not every query would have tovisit the same large set of nodes order to find the answerwith high probability. If each node knew the direction inwhich the nearest copy is likely to be found, it could routethe queries only to one neighbour and consequently reducethe overall number of hops. The routing improvements arebased on storing and exchanging the answer metadata andare here further refined with the metadata maintenancemechanism.

3.1 Remembering the metadataThe intermediate nodes may cache the answer message andthe neighbour ID to use it for routing later when a similarquery is issued elsewhere in the overlay. However, thecached answers are never passed back again (as in the localindices technique [11]), they are merely used to route thequeries. Subsequent queries still have to reach the targetnode which in turn will generate a new answer message ifthe files are still available.

The nodes only cache the fastest answers to a certainquery, thus building up only the efficient routes. Thefollowing pseudocode clarifies a process of buildingmetadata store and query routing.

ON receipt of a new query Q k:

== Only new queries: Ignore duplicate receipts:

== Record query ID and the neighbour N i who

passed it:

Save pair ðk; iÞ:== Check for a match in local file store

IF exists FP such that mðFP; QkÞ ¼¼ True

Generate and return a new answer Ajk ¼ ðmp; 0ÞELSE

== Check for a match in local metadata store

IF exists ðmp; hÞ such that mðmp; QkÞ¼¼ True

== Can be more than one!

== Choose the earliest match

Route Qk to Nm who passed Aij

¼ ðmp; hÞ with min h

ELSE == no idea about the file location

Flood Qk

674 IEE Proc.-Commun., Vol. 152, No. 5, October 2005

Page 3: Effective message routing in unstructured peer-to-peer overlays

ON receipt of an answer Aij:

== Check if we already have metadata from

more distant location:

IF exists same mp with higher higher h

== faster route or closer location

== update metadata

Update neighbour in the metadata store

ELSIF not exists same mp

== a new file was found: add new metadata

Insert ðmp; hÞ into the metadata store

Record the neighbour who passed Aij

END IF

Pass Aij back towards the query origin:

3.2 Metadata exchangeAlthough the routes are eventually being established byanswer caching, many queries still need to be flooded due tothe fact that the answer messages are only passed through amaximum of TTL nodes and the process of configuration isrelatively slow. The next suggestion is the exchange ofmetadata among nodes, just as if they were saying to eachother ‘I know a 3-hop way to mysong.mp3!’ The receivingnode should record the message and if no better routes areknown, start routing the relevant queries over this neighbour.

The nodes regularly exchange the metadata. They takecare not to exceed the query lifetime limit (TTL): when aknown path is of the same length as the query lifetime(maximum number of hops), not even one-hop-awayneighbours would be able to use it since their querieswould be dropped before they reached the target node.

The described principle originates from the idea of thedistributed routing protocols (RIP). However, its imple-mentation in peer-to-peer networks is not straightforward.One has to take into account the following issues:

� Routing refers to contents at unknown destination nodes.

� The number of prospective routing destinations is high,however the routing tables should not grow exceedingly.

� Dynamicity: nodes keep leaving and joining, adding andwithdrawing the contents they are willing to share.

Routing information could easily become obsolete, thusimplying routing in wrong directions. For these reasons thebasic idea is to be extended with several mechanisms as inthe following Sections.

3.3 Metadata issuesThe stored metadata represent content routing information.Each node is eventually likely to store as many metadataelements (corresponding to rows in a routing table) as is thenumber of different files accessible on the nodes within aquery lifetime radius.

A rough estimate was made for the average quantity ofmetadata in the case of a Gnutella-like file-sharing systembased on the estimates and experimental findings cited in[9, 11, 12] and our own experimental data from year 2003.Considering the average numbers of peer connections,shared files, keywords, file name length and query length,our estimate is that the relevant metadata should occupyno more than 10 MB of memory per peer node.

When peers are average personal computers, thenumber is reasonable, while less capable devices such as

the average handheld or mobile phone cannot use thatmuch memory for metadata and should rather useflooding instead. But even on PCs, obsolete metadata(leading to nonexistent contents or over broken paths) isto be removed eventually. One cannot simply deletemetadata after an average node lifetime passes since somenodes stay connected very long while others only connectfor a minute. Breaking an existing path causes more harmin terms of redundant message transfers (subsequentfloods) as tolerating a doubtful path until it can beconfirmed broken. The following Section deals withmethods of broken path detection and handling.

3.4 Dealing with newcoming and leavingnodesThe main problem in a dynamic environment is: what if thedestination node or any of the intermediate nodes on aformerly configured path leaves the overlay and becomesunavailable? Eventually the whole system might fall apart.The nodes should perceive that the answers are not comingback. They should flood the query to find either analternative path or other matching contents. A delayedflood would reach the same or other target content laterthan originally expected, but at least the answer wouldcome, provided that the content can be found within theradius of a query lifespan.

But how do the nodes detect a broken path and whoinitiates a delayed flood? When storing answer metadata asrouting info, the nodes should also keep track of the timeneeded from forwarding a query until the answer messagecame back. When routing next query over the same route anode should estimate when an answer should come back,allowing some extra time for unexpected delays. If theanswer message does not arrive within that time, the path isconsidered broken. Eventually all the nodes on the activepart of the path that did not flood the query, detect afailure. The answer should reach them by that time if thepath was not broken.

Afterwards, the delayed flood could be initiated in severalways: by the node that cannot reach its next neighbour, bythe source node or a user, or by all the nodes on the brokenpath. A partial flood would reach fewer nodes and theprobability of finding an answer would be lower. Alter-natively, with prolonged query lifetime, it would find theresults farther from the source node while possibly over-looking the ones in its vicinity. When flooding from thesource, a new protocol message is needed (mandatory floodon all nodes), posing a chance for misuse by eager users,trying to receive more answers. So we chose a flood by allthe nodes that routed a query but did not receive an answerafterwards. Since the nodes still keep track of query GUIDs,the flood is not multiplied at each node, each node stillforwards the query only once. This way the query achievesthe best response time possible and also reaches the same setof nodes as if it was flooded from the beginning. Newmessage types are not needed. The following pseudocodeformally describes detection and removal of broken paths.

ON receipt of a new query Qk

: : : : == Check for match in a local metadata store:

IF exists ðmp; hÞ such that mðmp; QkÞ ¼¼ True

== Can be more than one: Choose a match

that was received first:

Route Qk to the neighbour that passed Aij

with min h

IEE Proc.-Commun., Vol. 152, No. 5, October 2005 675

Page 4: Effective message routing in unstructured peer-to-peer overlays

== set and start a timer for answer to Qk

== ðthe expected delay plus an extra timeÞtk ¼ h þ Txtra

start tk

ELSE == no idea about the file location

flood Qk

ON receipt of an answer Aij:

== answer found; Qi will not flood

stop timer ti

: : : :

ON timeout from tk

== An answer should be there by now:

== Update metadata � remove broken path:

delete ðmp; hÞ that was used to route Qk

== Find an answer elsewhere:

Flood Qk

4 Simulation

Network topologies were generated with GLP networkgenerator [13]. The target size was around 1500 nodes andthe average node degree from 2.5 to 3.5, which is enough forthe chosen TTL (5 hops) and the query radius it produces.Each time-step, a randomly chosen node issues a query witha globally unique query identifier. Consistently with the filepopularity distribution, a file is chosen that the query wouldbe searching for. All the relevant messages are forwarded:freshly generated queries and answers as well as queries andanswers received from other nodes. Queries with expiredTTL are discarded. The simulation terminates when therequired number of time-steps is completed. Every ten time-steps one randomly chosen node is deleted and one newnode with the same number of links is inserted. Althoughthe generator is not incremental, this way the relevanttopology properties stay within the desired range for thetime of simulation. The number of files per node is selectedrandomly within a predefined range. The files are chosenaccording to their popularity: the probability for choosing afile F equals popularity of the file F. Simulations wereperformed with several popularity distributions, however,here only the results for Zipf distribution are presented,which can be found in existing systems (Gnutella-like).When generating answers, in the event of several matchesthe best one is chosen. There is the simplification that thequery is answered with the first answer message and the casewhere the user can choose how many answers are needed isdeliberately ignored. Although the system could be adaptedto this requirement, we can leave it out without loss ofgenerality.

Cold start simulations are described; in the beginning allthe nodes are unconfigured, while in the process newunconfigured nodes are joining and the configured oneskeep leaving. The simulations are ended after the systemreaches its stationary state, the average number of hops perquery gets more or less constant.

Time is measured in discrete intervals and the messagedelays are comparable with the number of hops. Since thiscan be unrealistic in an internet-based scenario, the effectsof too long or too short timeout intervals (for triggeringdelayed flood) on overall traffic was further analysed.

4.1 MetricsThe main goal is to reduce the overall message traffic. Thesimplest metric is the total number of message hops (HT) inthe whole simulation period (including cold start), however,more relevant is the average value in the stationary state(HS). One can evaluate the routing efficiency from severalviewpoints; R is the average number of nodes reached by aquery and M is the average node load per time, i.e. thenumber of forwarded queries.

A slightly modified definition of query price from [14] isC ¼ ðtotal query hopsÞ=nodes reached, while in [9] thepercent of redundant hops is defined as P ¼ ðHT � RÞ=HTand query efficiency D ¼ ðall query hopsÞ=effective hops,where a hop is effective when it reaches a node with thematching file.

The user experience should not get worse in comparisonwith flooding. In the simulation environment, user-relatedmetrics are the number of time intervals (AT) before ananswer is received, the number of hops from answer node tothe source node (AH), and a share of answered queries(QA) for the queries where an answer can be found withinthe TTL radius (flooding as well as both the improvementsshould always find such an answer).

It is required to minimise C, P, M, R, HS, HT and Dwhile keeping AT, AH and QA at flooding level.

4.2 Results and discussionFigure 1 compares the suggested improvements withflooding regarding all the mentioned metrics. One canclearly see that the mechanisms met expectations; userrelated metrics are almost 100%, while the others aresignificantly decreased. AT is only longer when a delayedflood is initiated, however, the difference is not significant inthe overall average. Even better response times should resultfrom less traffic and less congestion at the network level. Aquery is unanswered when the only node with the target fileleaves during the time of query execution or whensimulation is stopped, otherwise all the three mechanismsalways find a target if it is within TTL radius. Both HS andHT, as the most important metrics from the system view,are considerably reduced. To get a real picture one shouldadd a number of metadata exchange messages to the figuresof exchange. After that, both the suggested mechanism arepractically equivalent in M, HS and HT. As one can see, theaverage load per node is reduced to 11–15% of flooding,while the maximum load per node (not shown) is 10.8% inRemembering and only 3.6% in Exchange. This is very

0%

20%

40%

60%

80%

100%

AT AH QA C P R M HT Hs D

rem exchg flood

Fig. 1 Comparison of relevant metrics for Remembering andExchange with baseline of flooding¼ 100%(metadata maintenance mechanism with optimal timeout value)

676 IEE Proc.-Commun., Vol. 152, No. 5, October 2005

Page 5: Effective message routing in unstructured peer-to-peer overlays

important because the few heavily loaded nodes can carryas much as 30 to 50 times the average load and the systemresponse is much better when their load is significantlyreduced.

To evaluate the effect of too long or too short timeoutvalues for initiating a delayed flood a series of simulationswas executed with timeout varying from zero (query isrepeated the next time-step) to ten (query is repeatedafter waiting for ten time-steps). The optimal valuedepends on query properties and the node that initiatesa delayed flood and lies in the range from 2 to 2*TTLtime-steps; average is slightly above three. Figure 2 showsthe process of overlay configuration, the number of hopsper query decreasing from flood-like values to a lowerstationary value. The lowest line represents optimal systembehaviour, while the others are higher due to too shorttimeout values. Delayed floods are initiated too soon,although the answer could still be found. The shorter istimeout, the more unnecessary floods happen and higher isthe average number of hops per query in stationary state.However, when the value is too high, no harm is doneregarding the query hops count: only the required floodshappen. Strictly speaking, the user experience is slightlydeteriorated since the answers might return later if theflood is delayed more than needed, but in the simulationsthe differences in the average ATs were statisticallyinsignificant.

In Fig. 3 one can see the average values of R (nodesreached), M (node load), P (redundancy) and QA(answered questions) in simulations with different timeoutvalues. In the first column (FL), flooding values are shown.The system behaviour is at its worst when delayed flood isinitiated immediately after the query is issued (1); it getsbetter when using longer timeouts and becomes stable at theaverage optimal value. Other metrics (HS, HT, D, C) showalmost identical graphs as R, M and P, while AT and AHbehaviour is similar to the QA graph. Owing to spacelimitations they are not shown here.

From the Figures one can conclude that a smart choiceof timeout value is very important for the overall systemresponse. It is essential to stay on the safe side: even smalldeviations in the wrong direction (i.e. timeout too short) candouble the average amount of query hops. When thetimeout value is too long, query hops are kept at the lowestlevel while the average AT is slightly higher (howeverstatistically insignificant). This is explained as follows: ATvalues are already pretty dispersed and also delayed floodsare quite infrequent. If the answer is not found, this does

not affect the average AT value. The average is onlyaffected by the cases when an answer is found after adelayed flood. In the simulations AT in such cases wasone to four time units longer, while the average AT overall queries was from 3.5 to 3.9 time units. Based onexperience, a good rule of thumb is tk ¼ 2�h (seepseudocode in Section 3.4).

5 Related work

Several researchers studied routing in flooding-based net-works. Adamic et al. [15] study random walk in power-lawoverlay topologies, where queries are only forwarded tohigh degree nodes. Although otherwise effective, theirstrategy heavily loads high degree nodes and thus createsnew bottlenecks. Portmann [14] studies cost-effectiveepidemiological protocolsFdeterministic rumour monger-ing where the message should reach each node preferablyonly once. A query is forwarded to only a few randomlychosen neighbours. The traffic reduction is significant,however, the response times are much higher compared toflooding. Lv et al. [9] propose a strategy of multiple parallelrandom walks, which is simulated on several graphtopology types. The method is superior to flooding in thenumber of message transfers and average node cost, but themessage paths are too long to be considered optimal. Threeinteresting query routing techniques are suggested by Yang[11]: iterative deepening considerably reduces the number ofhops for queries that find the answers near the source node,but generates huge amounts of traffic for the rest. Directedbreadth-first search yields good results on average queries,but unusual ones are likely to remain unanswered. Localindices technique requires queries to be processed only atcertain depths from the source node, where target nodesgenerate the answers on behalf of their neighbour nodes.The local processing is reduced while communication costsare high since beside the query transfers, the nodes alsotransmit their heartbeat (‘I’m still alive’) and announce theircontent changes. The total number of the nodes reached isnot reduced and therefore only limited improvement ofeffectiveness can be demonstrated.

Scalability problems of unstructured peer-to-peer systemsare discussed in [16] and it is assumed a few correct designchoices could significantly improve them. Frequencydistribution of recurring queries is experimentally evaluatedto follow Zipf’s law in [10], while [17] deals with servinghighly popular files. Our assumption from previous work [6]is that the frequent queries may be used for reducing the

0

500

1,000

(i)

(ii)

(iii)

(iv)

1,500

2,000

2,500

3,000

time - consecutive queries

hops

per

que

ry -

exch

ange

hops

per

que

ry -

rem

embe

ring

60time - consecutive queries

Fig. 2 Process of overlay configuration for both suggested improvements and for several timeout values(i) Initiating delayed flood in next time interval after query was issued (T)(ii) For T+1(iii) For T+2(iv) For T+3Since lines for T+3 and higher lie virtually on top of each other, higher values are not shown

IEE Proc.-Commun., Vol. 152, No. 5, October 2005 677

Page 6: Effective message routing in unstructured peer-to-peer overlays

number of nodes visited and thus improving the scalability,while in this paper, both the previously suggested improve-ments are formalised and metadata maintenance mechan-ism is elaborated on and analysed.

6 Conclusions

Two previously suggested message routing improvementsbased on flooding have been formalised and analysed. Weproved their effectiveness by means of simulation. Severalperformance metrics were used for comparison withflooding. A metadata maintenance mechanism was for-malised and the effect of different timeout settings analysed.Other improvements have been suggested previously but thepresent ones are different in that the message needs to visitfewer nodes to find an answer with the same probabilityand in the same time, which can result in as much as ten-fold traffic reduction. The suggested improvements areapplicable in the environment with a high degree ofrepetitive queries, which is usual in today’s most widespreadfile-sharing systems. The main drawback is increasedmemory demand, acceptable for average PCs but too highfor weaker mobile devices. Study of combining theproposed routing techniques with random walks andstoring multiple routing metadata for popular files toreduce broken paths are objects to further research.

7 Acknowledgments

The author wishes to thank the anonymous reviewers fortheir valuable comments which have contributed tosignificant improvements of the paper.

8 References

1 Kung, H.T., and Wu, C.H.: ‘Content networks: taxonomy and newapproaches’ in Park, K. and Willinger, W. (Eds.).: ‘The internet as alarge-scale complex system’ (Oxford University Press, 2002)

2 Milojicic, D.S. et al.: ‘Peer-to-Peer Computing’. Technical ReportHPL-2002-57, HP Laboratories, Palo Alto, 2002. Available at http://www.hpl.hp.com/techreports/2002/ HPL-2002-57.html

3 Crowcroft, J., and Pratt, I.: ‘Peer-to-peer: peering into the future’.Presented at IFIP-TC6 Networks, 2002. Available at http://www.cl.cam.ac.uk/Research/SRG/netos/publications.html

4 Gnutella homepage, http://gnutella.wego.com, 20035 C. Rohrs, Query routing for the Gnutella network, 2002, Lime

Wire LLC, Available at http://www.limewire.com/developer/query_routing/keyword%20routing.htm

6 Ciglari$c, M.: ‘Content networks: distributed routing decisions inpresence of repeated queries’, Intl. J. Found. Comput. Sci., 2004, 15,(3), pp. 555–566

7 Faloutsos, M., Faloutsos, P., and Faloutsos, C.: ‘On power-lawrelationships of the internet topology’. Presented at the ACM Conf.on Applications, Technologies, Architecture and Protocols forComputer Communication, 1999

8 Watts, D.J., and Strogatz, S.H.: ‘Collective dynamics of small-worldnetworks’, Nature, 1998, p. 393

9 Lv, Q. et al.: ‘Search and replication in unstructured peer-to-peernetworks’. Presented at 16th ACM Int. Conf. on SupercomputingICS’02

10 Sripanidkulchai, K.: ‘The popularity of Gnutella queries and itsimplications on scalability’. Available at http://www-2.cs.cmu.edu/Bkunwadee/research/p2p/gnutella.html

11 Yang, B., and Garcia-Molina, H.: ‘Efficient search in peer-to-peernetworks’. Presented at Conf. ICDCS 2002. Available at http://dbpubs.stanford.edu/ pub/2001-47

12 Yang, B., and Garcia-Molina, H.: ‘Comparing hybrid peer-to-peersystems’. Presented at Conf. on Very Large Databases, VLDB 2001

13 Bu, T., and Towsley, D.: ‘On distinguishing between internet power-law topology generators’. Presented at INFOCOM 2002

14 Portmann, M., and Seneviratne, A.: ‘Cost-effective broadcast for fullydecentralized peer-to-peer networks’, Comput. Commun., 2003, 26,(11), pp. 1159–1167

15 Adamic, L., Lukose, R., Puniyani, A., and Huberman, B.: ‘Search inpower-law networks’, Phys. Rev. E, 2001, 64, p. 46135

16 Lv, Q., Ratnasamy, S., and Shenker, S.: ‘Can heterogeneity makeGnutella scalable?’ Presented at 1st Int. Workshop on Peer-to-PeerSystems IPTPS’02

17 Zerfiridis, K.G., and Karatza, H.D.: ‘File distribution using a peer-to-peer networkFa simulation study’, J. Syst. Softw., 2004, 73, (1),pp. 31–44

0

200

400

600

800

1000

1200

aver

age

R

exchange remembering

0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

aver

age

M p

er n

ode

per t

ime

0

0.1

0.2

0.3

0.4

0.5

FL 100 1 2 3 4 5 6 7 8 9

timeout value

aver

age

redu

ndan

cy -

P p

er q

uery

FL 100 1 2 3 4 5 6 7 8 9

timeout value

0

0.2

0.4

0.6

0.8

1.0

1.2

answ

ered

que

ries

(flo

odin

g=1

00)

Fig. 3 Comparison of R, M, P and QA over several timeout values(FL¼ flood; 0 y 10: waiting time before triggering delayed flood)

678 IEE Proc.-Commun., Vol. 152, No. 5, October 2005