Cost-effective broadcast for fully decentralized peer-to-peer networks

  • Published on
    02-Jul-2016

  • View
    218

  • Download
    0

Embed Size (px)

Transcript

  • Cost-effective broadcast for fully decentralized peer-to-peer networks

    Marius Portmanna,*, Aruna Seneviratneb

    aSchool of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, AustraliabComputer and Networks Laboratory, Swiss Federal Institute of Technology ETH, CH-8092 Zurich, Switzerland

    Received 9 October 2002; accepted 9 October 2002

    Abstract

    Fully unstructured and decentralized peer-to-peer networks such as Gnutella are appealing for a variety of applications, among which file-

    sharing is the most prominent one. The decentralized nature of these systems provides a high degree of robustness and the ability to cope with

    a highly dynamic and transient network environment. However, the lack of centralized directory nodes makes the task of searching more

    expensive and difficult. In completely unstructured peer-to-peer networks, searching can only be realized via application-layer broadcast,

    where query messages are routed to every node in the network. Gnutella implements application-layer broadcast by using flooding as the

    underlying message routing mechanism. Flooding creates a large amount of traffic and can quickly exhaust the resources of nodes in a large

    network. In this paper, we explore Rumor mongering (also known as Gossip) as a more cost-effective and scalable alternative to flooding for

    implementing services such as searching in decentralized peer-to-peer networks. We further present a new variant of the Rumor mongering

    protocol, which exploits the power-law characteristics of typical peer-to-peer networks and achieves a significant further reduction in cost.

    q 2002 Elsevier Science B.V. All rights reserved.

    Keywords: Peer-to-peer networks; Broadcast; Rumor mongering

    1. Introduction

    Recently, there has been a lot of activity in the area of

    peer-to-peer networking, sparked by the popularity of file-

    sharing applications such as Napster,1 Freenet [12] and

    Gnutella.2 Although the exact meaning of the term peer-to-

    peer is very debatable, these systems typically depend on a

    number of voluntarily participating nodes contributing

    resources to create some form of infrastructure. The most

    well-known service that such a peer-to-peer infrastructure

    can provide is certainly file-sharing, but other systems with

    completely different applications such as SETI@home3 are

    also labeled as peer-to-peer.

    Even among the peer-to-peer systems aimed at file-

    sharing, there exist big differences in terms of architecture

    and implementation of key mechanisms such as searching.

    Napster, for instance, relies on a central indexing server to

    locate files that are shared. Gnutella implements searching

    in a fully decentralized and distributed manner, without the

    need for any special nodes with facilitating or adminis-

    trative roles. The advantages of a decentralized architecture

    are clear. It allows accommodating the highly dynamic and

    transient nature of typical peer-to-peer networks, where

    nodes frequently join and leave the network. Furthermore,

    the absence of a central node that represents a single point of

    failure, results in increased fault tolerance and robustness.

    Gnutella provides two types of services via its overlay

    network: Searching for files and peer discovery. These

    services are implemented through broadcasting messages

    across the network. This application-level broadcast is

    implemented with flooding as the underlying routing

    mechanism, where every node acts a router and forwarder

    for messages sent by other nodes. The use of flooding

    obviously raises the question of cost and scalability.

    Individual nodes can be overloaded with messages to

    forward, and might not have enough resources such as CPU

    cycles, memory or network bandwidth required for the task.

    This problem has also been observed in the real Gnutella

    system,4 where the fact that nodes could not keep up with

    0140-3664/03/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved.

    PII: S0 14 0 -3 66 4 (0 2) 00 2 50 -5

    Computer Communications 26 (2003) 11591167

    www.elsevier.com/locate/comcom

    * Corresponding author.

    E-mail addresses: marius@ieee.org (M. Portmann), a.seneviratne@

    unsw.edu.au (A. Seneviratne).1 The Napster home page, http://www.napster.com2 The Gnutella home page, http://gnutella.wego.com3 The SETI@home home page, http://setiathome.ssl.berkeley.edu

    4 Distributed Search Solutions (DSS) group, Gnutella: To the Bandwidth

    Barrier and Beyond, http://www.clip2.com/gnutella.html

  • the rate of messages to be forwarded has lead to a

    fragmentation of the network. It is therefore important for

    the successful deployment of decentralized peer-to-peer

    networks that the problem of cost and scalability is

    addressed, and more cost-effective methods for appli-

    cation-level broadcast are explored.

    As an alternative to flooding, we study the use of Rumor

    mongering (also known as Gossip) for message routing.

    Rumor mongering trades off reliability and speed for a

    reduction in cost, by forwarding messages only to a randomly

    selected subset of neighbors. By means of simulation, we

    show that Rumor mongering can implement application-

    layer broadcast at a considerably lower cost than flooding.

    We further present a novel variant of the protocol, called

    Deterministic rumor mongering, which is an even more

    cost-effective method of broadcast message routing. Deter-

    ministic rumor mongering makes a more intelligent decision

    about message forwarding by making use of the fact that the

    topology of a typical peer-to-peer network displays a power-

    law distribution of the node degrees.5 In networks with a

    power-law characteristic [3], few nodes have a very high

    degree and many have a low degree, and the number of

    nodes with a certain degree decreases with the degree

    according to the following power law: fd / dk: The numberof nodes of a degree d, fd, is proportional to d to the power of

    a negative constant k.

    In Ref. [1] Adamic et al. addresses the problem of

    searching in power-law networks. They suggest a random-

    walk search strategy, in which the query messages are

    directed towards nodes of a high degree, for which the paper

    shows a superior scaling behavior in terms of cost.

    However, their work does not consider that by directing

    messages towards high-degree nodes, these nodes are very

    likely to be overloaded, especially in peer-to-peer networks

    where resources can be assumed to be distributed rather

    uniformly among the nodes. This is in contrast to the

    approach taken in this paper.

    The rest of the paper is structured as follows. In Section

    2, we introduce our simulation platform and we discuss

    relevant parameters such as the metric of cost. Section 3

    studies the cost of flooding-based broadcast, as it is

    implemented in Gnutella. In Sections 4 and 5, we evaluate

    two variants of Rumor mongering as an alternative to

    flooding. Finally, Section 6 concludes the paper.

    2. Simulationplatform and parameters

    2.1. Flooding-based broadcast

    We are going to study the cost of flooding-based

    application-layer broadcast as it is implemented in Gnutella.

    The corresponding rules for routing messages are as

    follows:

    In general, a message received by a node is forwarded toall its neighbors, except for the one, where the message

    was received from.

    Each message has a Time To Live (TTL) field which isdecremented by one at each visited node. If the TTL

    reaches zero, the message is no longer forwarded.

    Each message has an ID to uniquely identify it on thenetwork.Every nodekeeps a record of IDsofmessages that

    it has seen in the recent past. If a node receives a message

    with the same ID and type as one that it has received

    before,6 the message is ignored and not forwarded.

    2.2. Metric of cost

    In order to evaluate the cost of application-level broad-

    cast mechanisms, we first need to define a metric of cost. An

    obvious choice for this is the average number of messages

    that need to be forwarded per node reached in a single

    broadcast. The number of messages forwarded by a node

    relates directly to the amount of resources, such as

    bandwidth and CPU cycles, that are consumed. As our

    metric, we define the cost c as follows:

    c 1r

    XN

    i1mi

    where r is the number of nodes reached by a broadcast, mi,

    the number of messages forwarded by node i, N is the

    number of nodes in the network.

    To illustrate the cost metric defined above, Fig. 1 shows

    Gnutellas broadcast mechanism for two simple topologies

    which consist of three nodes. The first is a simple spanning

    tree with only two bidirectional links. The second has the

    same number of nodes but has an additional link between

    node 2 and 3.

    As the first step, the initiator of the broadcast (node 1)

    sends messages to its immediate neighbors (2 and 3). In the

    case of the first topology, the broadcast is completed. The

    total number of messages generated is 2, one per node

    reached. Therefore, according to our metric, the cost is 1. In

    the second topology, additional messages are sent from

    node 2 to 3 and vice versa, and the cost is 2, twice as high.

    The theoretical minimal cost for the type of broadcast

    considered here is one message per node to be reached, i.e.

    c 1: This minimal cost can only be achieved for spanningtree topologies with no redundant links. It seems intuitively

    clear that the more redundant links a topology contains, the

    higher the cost of flooding-based broadcasting.

    2.3. The simulator

    To study the cost of application-layer broadcast mech-

    anisms, we built a discrete event simulator in Java.

    5 The degree of a node is defined as the number of immediate neighbors.

    6 Although the Gnutella protocol specification in Footnote 8 does not say

    for how long the message IDs need to be cached by the nodes, it is assumed

    that it is at least a multiple of the typical live time of a message.

    M. Portmann, A. Seneviratne / Computer Communications 26 (2003) 115911671160

  • This simulator takes as input a graph file, describing the

    topology of the network to be simulated. Its modular design

    allows us to easily simulate different kinds of broadcast

    routing protocols.

    The simulation is based on synchronous rounds. In each

    round, every node reads the messages from its input queue

    and handles them according to the specified routing rules.

    This means either to forward the message to one or several

    of its neighbors or to discard it. Each node keeps track of the

    number of messages it forwards during each round. A

    simulation of a broadcast is initiated by having one node in

    the network send a message to all its neighbors and the

    simulation ends when no more messages are forwarded in a

    round. The results of the simulation may depend on which

    node was chosen to start the broadcast. We therefore

    perform a large number7 of simulation runs, with the node

    initiating the broadcast chosen uniformly at random, and

    take the average over all the runs.

    2.4. Topology

    To establish a realistic cost of application layer broadcast

    for peer-to-peer networks, it is necessary to generate a

    network topology with the typical characteristics of peer-to-

    peer networks. One such characteristic that has been

    observed for peer-to-peer networks by measurement4 and

    simulation [2], as for many other communication [3] and

    social networks [4], is a power-law distribution in their node

    degree.

    A model proposed by Barabasi and Albert [5] enables the

    generation of random topologies with a power-law charac-

    teristic. This model suggests two possible causes for the

    emergence of a power-law in the frequency of node degrees

    in network topologies: incremental growth and preferential

    connectivity. Incremental growth means that growing

    networks are formed by the continual addition of new

    nodes, and thus the gradual increase in the size of the

    network. Preferential connectivity refers to the tendency of

    a new node to connect to existing nodes that are highly

    connected or popular. Both of these are typical character-

    istics of peer-to-peer networks, which usually grow and

    evolve in such an ad hoc fashion. We therefore believe that

    the BarabasiAlbert model serves as a good basis to

    generate random topologies with the typical characteristics

    of peer-to-peer networks.

    To generate the random topologies for our simulation,

    we used BRITE [6], a topology generator developed at

    Boston University. BRITE supports, among others, the

    BarabasiAlbert model and allows exporting the graphs in

    a standard format, which can be read by our simulation

    tool. In the rest of the paper, we will refer to a topology

    generated with the BarabasiAlbert model simply as a

    Barabasi topology.

    2.5. TTL

    For a broadcast in Gnutella, the number of nodes that can

    be reached from a node depends on the TTL value of the

    initial message. In our analysis and simulations of

    Gnutellas flooding-based broadcast, we set the TTL to a

    maximum value, which guarantees that a broadcast can

    reach 100% of the network. To choose a smaller TTL value

    would simply result in a broadcast in smaller sub-network.

    Furthermore, this allows us to compare the results for

    flooding with the ones obtained for alternative routing

    methods that do not have the concept of a TTL value.

    3. The cost of flooding-based broadcast

    In this section, we try to find the cost of flooding-based

    broadcast as it is implemented in Gnutella. Under the

    assumption that the initial TTL value is high enough for the

    broadcast to cover the entire network, the cost can be

    calculated relatively easily. During a broadcast, every node

    receives the message at least once. The first time the

    message is received, the receiving node forwards it to all its

    neighbors, except the node where the message came from.

    Subsequent arrivals of the same message are ignored.

    Therefore, the number of messages that each node has to

    forward per broadcast is limited to the number of its

    neighbors, minus one. Only the node initiating the broadcast

    sends the message to all its neighbors, which results in one

    additional message to be forwarded. Hence, the cost c of a

    broadcast as defined in the previous section can be

    calculated simply as a function of the number of nodes N

    and the average node degree d, i.e. the average number of

    neighbors per node. Flooding guarantees that all N nodes of

    the network are being reached by a broadcast, which means

    r N:

    c 1 XN

    i1di 2 1

    N 1 Nd 2 1

    N< d 2 1

    where c is the cost in number of messages forwarded per

    node reached, di; the degree of node i, d, the average node

    degree, N is the number of nodes.

    Fig. 1. Routing example to illustrate the metric of cost.

    7 For practical reasons, we keep the number of simulation runs constant at

    1000, irrespective of the size of the network to be simulated.

    M. Portmann, A. Seneviratne / Computer Communications 26 (2003) 11591167 1161

  • For N sufficiently large, the cost can be approximated

    with d 2 1: It is interesting to note that the cost c growslinearly with the average node degree and is virtually

    independent of N.

    To get an idea about the scalability of Gnutellas

    broadcast based services we need to know the load in

    terms of resource usage per node,...

Recommended

View more >