Cost-effective broadcast for fully decentralized peer-to-peer networks
Marius Portmanna,*, Aruna Seneviratneb
aSchool of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, AustraliabComputer and Networks Laboratory, Swiss Federal Institute of Technology ETH, CH-8092 Zurich, Switzerland
Received 9 October 2002; accepted 9 October 2002
Fully unstructured and decentralized peer-to-peer networks such as Gnutella are appealing for a variety of applications, among which file-
sharing is the most prominent one. The decentralized nature of these systems provides a high degree of robustness and the ability to cope with
a highly dynamic and transient network environment. However, the lack of centralized directory nodes makes the task of searching more
expensive and difficult. In completely unstructured peer-to-peer networks, searching can only be realized via application-layer broadcast,
where query messages are routed to every node in the network. Gnutella implements application-layer broadcast by using flooding as the
underlying message routing mechanism. Flooding creates a large amount of traffic and can quickly exhaust the resources of nodes in a large
network. In this paper, we explore Rumor mongering (also known as Gossip) as a more cost-effective and scalable alternative to flooding for
implementing services such as searching in decentralized peer-to-peer networks. We further present a new variant of the Rumor mongering
protocol, which exploits the power-law characteristics of typical peer-to-peer networks and achieves a significant further reduction in cost.
q 2002 Elsevier Science B.V. All rights reserved.
Keywords: Peer-to-peer networks; Broadcast; Rumor mongering
Recently, there has been a lot of activity in the area of
peer-to-peer networking, sparked by the popularity of file-
sharing applications such as Napster,1 Freenet  and
Gnutella.2 Although the exact meaning of the term peer-to-
peer is very debatable, these systems typically depend on a
number of voluntarily participating nodes contributing
resources to create some form of infrastructure. The most
well-known service that such a peer-to-peer infrastructure
can provide is certainly file-sharing, but other systems with
completely different applications such as SETI@home3 are
also labeled as peer-to-peer.
Even among the peer-to-peer systems aimed at file-
sharing, there exist big differences in terms of architecture
and implementation of key mechanisms such as searching.
Napster, for instance, relies on a central indexing server to
locate files that are shared. Gnutella implements searching
in a fully decentralized and distributed manner, without the
need for any special nodes with facilitating or adminis-
trative roles. The advantages of a decentralized architecture
are clear. It allows accommodating the highly dynamic and
transient nature of typical peer-to-peer networks, where
nodes frequently join and leave the network. Furthermore,
the absence of a central node that represents a single point of
failure, results in increased fault tolerance and robustness.
Gnutella provides two types of services via its overlay
network: Searching for files and peer discovery. These
services are implemented through broadcasting messages
across the network. This application-level broadcast is
implemented with flooding as the underlying routing
mechanism, where every node acts a router and forwarder
for messages sent by other nodes. The use of flooding
obviously raises the question of cost and scalability.
Individual nodes can be overloaded with messages to
forward, and might not have enough resources such as CPU
cycles, memory or network bandwidth required for the task.
This problem has also been observed in the real Gnutella
system,4 where the fact that nodes could not keep up with
0140-3664/03/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved.
PII: S0 14 0 -3 66 4 (0 2) 00 2 50 -5
Computer Communications 26 (2003) 11591167
* Corresponding author.
E-mail addresses: email@example.com (M. Portmann), a.seneviratne@
unsw.edu.au (A. Seneviratne).1 The Napster home page, http://www.napster.com2 The Gnutella home page, http://gnutella.wego.com3 The SETI@home home page, http://setiathome.ssl.berkeley.edu
4 Distributed Search Solutions (DSS) group, Gnutella: To the Bandwidth
Barrier and Beyond, http://www.clip2.com/gnutella.html
the rate of messages to be forwarded has lead to a
fragmentation of the network. It is therefore important for
the successful deployment of decentralized peer-to-peer
networks that the problem of cost and scalability is
addressed, and more cost-effective methods for appli-
cation-level broadcast are explored.
As an alternative to flooding, we study the use of Rumor
mongering (also known as Gossip) for message routing.
Rumor mongering trades off reliability and speed for a
reduction in cost, by forwarding messages only to a randomly
selected subset of neighbors. By means of simulation, we
show that Rumor mongering can implement application-
layer broadcast at a considerably lower cost than flooding.
We further present a novel variant of the protocol, called
Deterministic rumor mongering, which is an even more
cost-effective method of broadcast message routing. Deter-
ministic rumor mongering makes a more intelligent decision
about message forwarding by making use of the fact that the
topology of a typical peer-to-peer network displays a power-
law distribution of the node degrees.5 In networks with a
power-law characteristic , few nodes have a very high
degree and many have a low degree, and the number of
nodes with a certain degree decreases with the degree
according to the following power law: fd / dk: The numberof nodes of a degree d, fd, is proportional to d to the power of
a negative constant k.
In Ref.  Adamic et al. addresses the problem of
searching in power-law networks. They suggest a random-
walk search strategy, in which the query messages are
directed towards nodes of a high degree, for which the paper
shows a superior scaling behavior in terms of cost.
However, their work does not consider that by directing
messages towards high-degree nodes, these nodes are very
likely to be overloaded, especially in peer-to-peer networks
where resources can be assumed to be distributed rather
uniformly among the nodes. This is in contrast to the
approach taken in this paper.
The rest of the paper is structured as follows. In Section
2, we introduce our simulation platform and we discuss
relevant parameters such as the metric of cost. Section 3
studies the cost of flooding-based broadcast, as it is
implemented in Gnutella. In Sections 4 and 5, we evaluate
two variants of Rumor mongering as an alternative to
flooding. Finally, Section 6 concludes the paper.
2. Simulationplatform and parameters
2.1. Flooding-based broadcast
We are going to study the cost of flooding-based
application-layer broadcast as it is implemented in Gnutella.
The corresponding rules for routing messages are as
In general, a message received by a node is forwarded toall its neighbors, except for the one, where the message
was received from.
Each message has a Time To Live (TTL) field which isdecremented by one at each visited node. If the TTL
reaches zero, the message is no longer forwarded.
Each message has an ID to uniquely identify it on thenetwork.Every nodekeeps a record of IDsofmessages that
it has seen in the recent past. If a node receives a message
with the same ID and type as one that it has received
before,6 the message is ignored and not forwarded.
2.2. Metric of cost
In order to evaluate the cost of application-level broad-
cast mechanisms, we first need to define a metric of cost. An
obvious choice for this is the average number of messages
that need to be forwarded per node reached in a single
broadcast. The number of messages forwarded by a node
relates directly to the amount of resources, such as
bandwidth and CPU cycles, that are consumed. As our
metric, we define the cost c as follows:
where r is the number of nodes reached by a broadcast, mi,
the number of messages forwarded by node i, N is the
number of nodes in the network.
To illustrate the cost metric defined above, Fig. 1 shows
Gnutellas broadcast mechanism for two simple topologies
which consist of three nodes. The first is a simple spanning
tree with only two bidirectional links. The second has the
same number of nodes but has an additional link between
node 2 and 3.
As the first step, the initiator of the broadcast (node 1)
sends messages to its immediate neighbors (2 and 3). In the
case of the first topology, the broadcast is completed. The
total number of messages generated is 2, one per node
reached. Therefore, according to our metric, the cost is 1. In
the second topology, additional messages are sent from
node 2 to 3 and vice versa, and the cost is 2, twice as high.
The theoretical minimal cost for the type of broadcast
considered here is one message per node to be reached, i.e.
c 1: This minimal cost can only be achieved for spanningtree topologies with no redundant links. It seems intuitively
clear that the more redundant links a topology contains, the
higher the cost of flooding-based broadcasting.
2.3. The simulator
To study the cost of application-layer broadcast mech-
anisms, we built a discrete event simulator in Java.
5 The degree of a node is defined as the number of immediate neighbors.
6 Although the Gnutella protocol specification in Footnote 8 does not say
for how long the message IDs need to be cached by the nodes, it is assumed
that it is at least a multiple of the typical live time of a message.
M. Portmann, A. Seneviratne / Computer Communications 26 (2003) 115911671160
This simulator takes as input a graph file, describing the
topology of the network to be simulated. Its modular design
allows us to easily simulate different kinds of broadcast
The simulation is based on synchronous rounds. In each
round, every node reads the messages from its input queue
and handles them according to the specified routing rules.
This means either to forward the message to one or several
of its neighbors or to discard it. Each node keeps track of the
number of messages it forwards during each round. A
simulation of a broadcast is initiated by having one node in
the network send a message to all its neighbors and the
simulation ends when no more messages are forwarded in a
round. The results of the simulation may depend on which
node was chosen to start the broadcast. We therefore
perform a large number7 of simulation runs, with the node
initiating the broadcast chosen uniformly at random, and
take the average over all the runs.
To establish a realistic cost of application layer broadcast
for peer-to-peer networks, it is necessary to generate a
network topology with the typical characteristics of peer-to-
peer networks. One such characteristic that has been
observed for peer-to-peer networks by measurement4 and
simulation , as for many other communication  and
social networks , is a power-law distribution in their node
A model proposed by Barabasi and Albert  enables the
generation of random topologies with a power-law charac-
teristic. This model suggests two possible causes for the
emergence of a power-law in the frequency of node degrees
in network topologies: incremental growth and preferential
connectivity. Incremental growth means that growing
networks are formed by the continual addition of new
nodes, and thus the gradual increase in the size of the
network. Preferential connectivity refers to the tendency of
a new node to connect to existing nodes that are highly
connected or popular. Both of these are typical character-
istics of peer-to-peer networks, which usually grow and
evolve in such an ad hoc fashion. We therefore believe that
the BarabasiAlbert model serves as a good basis to
generate random topologies with the typical characteristics
of peer-to-peer networks.
To generate the random topologies for our simulation,
we used BRITE , a topology generator developed at
Boston University. BRITE supports, among others, the
BarabasiAlbert model and allows exporting the graphs in
a standard format, which can be read by our simulation
tool. In the rest of the paper, we will refer to a topology
generated with the BarabasiAlbert model simply as a
For a broadcast in Gnutella, the number of nodes that can
be reached from a node depends on the TTL value of the
initial message. In our analysis and simulations of
Gnutellas flooding-based broadcast, we set the TTL to a
maximum value, which guarantees that a broadcast can
reach 100% of the network. To choose a smaller TTL value
would simply result in a broadcast in smaller sub-network.
Furthermore, this allows us to compare the results for
flooding with the ones obtained for alternative routing
methods that do not have the concept of a TTL value.
3. The cost of flooding-based broadcast
In this section, we try to find the cost of flooding-based
broadcast as it is implemented in Gnutella. Under the
assumption that the initial TTL value is high enough for the
broadcast to cover the entire network, the cost can be
calculated relatively easily. During a broadcast, every node
receives the message at least once. The first time the
message is received, the receiving node forwards it to all its
neighbors, except the node where the message came from.
Subsequent arrivals of the same message are ignored.
Therefore, the number of messages that each node has to
forward per broadcast is limited to the number of its
neighbors, minus one. Only the node initiating the broadcast
sends the message to all its neighbors, which results in one
additional message to be forwarded. Hence, the cost c of a
broadcast as defined in the previous section can be
calculated simply as a function of the number of nodes N
and the average node degree d, i.e. the average number of
neighbors per node. Flooding guarantees that all N nodes of
the network are being reached by a broadcast, which means
c 1 XN
i1di 2 1
N 1 Nd 2 1
N< d 2 1
where c is the cost in number of messages forwarded per
node reached, di; the degree of node i, d, the average node
degree, N is the number of nodes.
Fig. 1. Routing example to illustrate the metric of cost.
7 For practical reasons, we keep the number of simulation runs constant at
1000, irrespective of the size of the network to be simulated.
M. Portmann, A. Seneviratne / Computer Communications 26 (2003) 11591167 1161
For N sufficiently large, the cost can be approximated
with d 2 1: It is interesting to note that the cost c growslinearly with the average node degree and is virtually
independent of N.
To get an idea about the scalability of Gnutellas
broadcast based services we need to know the load in
terms of resource usage per node,...