Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
DNSR: Domain Name Suffix-based Routing
in Overlay Networks ?
Demetrios Zeinalipour-Yazti
Dept. of Computer Science
University of California
Riverside, CA 92507, U.S.A.
Abstract. Overlay Peer-to-Peer (P2P) networks are application layer networks which facilitate users
in performing distributed functions such as keyword searches over the data of other users. An im-
portant problem in such networks is that the connection among peers are arbitrary, leading in that
way to a topology structure which doesn’t match the underlying physical topology. This phenomenon
leads to excessive network resource consumption in Wide Area Networks as well as degraded user
experience because of the incurred network delays.
Most state-of-the-art research concentrates on structuring overlay networks in a way that query
messages can reach the appropriate nodes within some hop-count boundaries. These approaches are
not taking into account the underlying network topology mismatch making it therefore inappropriate
for wide area routing.
In this work we propose and evaluate DNSR (Domain Name Suffix-based Routing), which is a novel
technique to route query messages in Overlay Networks, based on the ”domain closeness” of the node
sending the message. We describe DNSR and show simulation experiments which are performed over
PeerWare, our distributed infrastructure which runs on a network of 50 workstations. Our simulations
are based on real data gathered from one of the largest open P2P networks, namely Gnutella.
1 Introduction
The advancement in public networks in the last recent years has increased the demand for distributed
application-layer collaboration suites that can be used in the context of multicast [2], object-location [8, 9],
ad-hoc collaboration [27] and information retrieval[5, 6, 7]. Moreover the recent initial success of centralized
and distributed Peer-to-Peer systems such as Napster [29, 23] and Gnutella [26] has proven that distributed
applications are feasible and that they may dominate the client-server model in the coming years.
Overlay Peer-to-Peer (P2P) networks are application layer networks which facilitate users in performing
distributed functions such as keyword searches over the data of other users. In Unstructured P2P networks,
? Course Project for ”CS202 - Advanced Operating Systems ”, with Vana Kalogeraki at the University of California
- Riverside, Department of Computer Science, April 2003. http://www.cs.ucr.edu/~csyiazti/cs202.html
1
10
100
1000
10000
100000
1 10 100
Num
ber
of IP
s co
ntrib
uted
.
Number of Domains. (Log-Log Plot)
Rank of Domains (based on IPs contributed to Gnutella)
Domains in June 2002
Fig. 1. The Number of IPs contributed by number of ISPs or domain in Gnutella on the 1st of June 2002. The
total IP set includes averagely 300,000 IP addresses and the figure shows that most of these IPs are contributed by
only a small number of ISPs or domains.
network hosts have neither global knowledge nor structure and efficient query routing is based on routing
indices [3], heuristics [7] and caching [5]. In structured P2P networks [8, 9] on the other hand, network
hosts and objects are structured in such a way that object location can be guaranteed within some hop
count boundaries. An important problem in both types of networks is that the connection between any two
peers is arbitrary, leading in that way to a topology structure which doesn’t match the underlying physical
topology. This phenomenon leads to excessive network resource consumption in Wide Area Networks as
well as degraded user experience because of the incurred network delays.
In this paper we will describe DNSR which is a simple routing algorithm for Overlay Networks which
attempts to route messages to nodes whose domain name is closer to the domain name of the node that
needs to forward a message. Such a protocol can be useful in many different settings. In the context of
distributed web crawling for instance, it is desirable to send a crawling request to nodes that are closest
to the target web server since this will reduce the network latency for the subsequence downloads between
the crawler and the web server. Another application might be in the context of distributed file sharing. A
user looking for a file would benefit a lot by firstly querying nodes which belong to the same ISP. Moreover
once the file is located the actual download time might also be reduced significantly.
1.1 Motivation
Our work is motivated from our study on the Gnutella network traffic in [4]. In this study we found among
others that most of the network hosts found in the Gnutella network do actually belong only to a small
number of ISPs or domains (see figure 1). More specifically we found that 58, 73% of the IP addresses
AverageRTT= 9 ms 4 Router Hops
AverageRTT= 140ms 19 Router Hops
8,806Km
AverageRTT= 163ms 19 Router Hops
8,747Km
pc-62-30-117-83-cr.blueyonder.co.uk
sdcax6-097.dialup.optusnet.com.au
AverageRTT= 184ms 22 Router Hops
12,764Km
p237-165.yahoo.co.jp
roc-24-169-109-208.rochester.rr.com
12-224-0-236.client.attbi.com AverageRTT= 46ms 13 Router Hops
1,544Km
AverageRTT= 130ms 22 Router Hops
3,933Km
66-215-0-xx1.oc-nod.charterpipeline.net
66-215-0-xx2.oc-nod.charterpipeline.net
London
Melbourne
Seattle Rochester
Tokyo
Riverside
Fig. 2. Intra-domain routing becomes attractive as compared to routing between Autonomous Systems that are
geographically dispersed because the Average Round Trip Time (RTT) is reduced significantly.
found in the Gnutella network are owned by only 20 ISPs. In this work we want to exploit this observation
in order to build and evaluate a more efficient routing algorithm for overlay network.
The effect of creating overlay topologies without being aware of the underlying topology can have a
dramatic impact on the performance of the application running on top of the overlay. Lets consider for ex-
ample figure 2, where node 66-215-0-xx1.oc-nod.charterpipeline.net, located in Riverside (USA), is
connected to six other peers in Tokyo (Japan), Melbourne (Australia), London (U.K), Rochester (U.S.A),
Seattle (U.S.A) and Riverside (U.S.A). The presented IP addresses are actually taken from the set of IP ad-
dresses found in the Gnutella network. For each host we calculate the average Round Trip Time (RTT) and
the number of intermediate routers. We can see that the RTT for intra-domain routing is minimal (≈ 10ms).
Therefore the Propagation delay, which is a function of the physical distance between A and its peers, is
kept minimal. The same happens for the Queuing Delays, which are a function of the number of interme-
diate routers. Therefore it would be desirable to have node 66-215-0-xx1.oc-nod.charterpipeline.net
# Sub-domain Number of IPs
1 .tampabay.rr.com 1364
2 .nyc.rr.com 1276
3 .houston.rr.com 1047
4 .austin.rr.com 951
5 .twcny.rr.com 855
Table 1. The first five sub-domains of the RoadRunner ISP (rr.com). The table shows that it is feasible to form
clusters of IPs which have the longest suffix domain name match.
connect to other fellow users in the same domain (e.g. 66-215-0-xx2.oc-nod.charterpipeline.net)
rather that somebody in a different domain.
Somebody would expect that the main obstacle in our effort is that domain names do not reveal too
many things about the geographic location of some host. For example the web server of the Telia.com
resides physically in Sweden although it has a .com domain name. Fortunately, as we will describe in
this report, the DNSR algorithm does not rely on the geographic location of a domain but rather on the
assumption that users of the same domain are clustered together. Given such a topology DNSR tries to
keep message routing local (i.e. within hosts of the same domain)
Another point to denote is that hosts within the same domain (e.g. rr.com) may be further structured.
We can see this result in table 1, where we present the first five sub-domains of the RoadRunner ISP. Inter-
estingly this particular ISP, as well as some others, reveals geographical location of a particular host. This
observation is utilized in the GeoTrack technique presented in [12]. Unfortunately it is not always possible
to infer geographic location because a domain name does not inherently contain any such indication. For
example we have observed that many ISPs like America Online (aol.com) use domain names of the form
hashcode.aol.com. Although such a structure can be characterized as a flat domain name structure, we
believe that the hashcode assignment is not random. Given the reverse hash function somebody should be
able to map the domain name to some geographic location, but this is just a conjecture.
The main design objectives of the DNSR routing scheme can be summarized as following:
1. Decentralized Routing Algorithm. Routing algorithms that use global knowledge, such as the
Link State Routing algorithm used in OSPF [15], tend to have a significant communication overhead
which makes them inefficient for dynamic environments were the participants are not known a’ priori.
DNSR is a decentralized routing algorithm which aims to use information about direct neighbors only.
This characteristic will provide DSNR with scalability.
gnuDC - gnutella Distributed Crawler
gnuDC Brick
Log Manager
P2P Network Module
Local Logs config.txt
IP Index Server
Local Repository
gnuDC Brick
gnuDC Brick
gnuDC Brick
Local Logs
Local Logs
Local Logs
.
.
. Logs Aggregator
Local Repository
Logs Analyzer
Results
1
2
3
4
Gnutella Network
Fig. 3. gnuDC - Gnutella Distributed Crawler.
2. Avoid Routing Updates. Our routing scheme is targeted towards dynamic environments were nodes
join and leave in an ad-hoc manner. Routing updates in dynamic environments come with a certain
price though because at the time a change is propagated it may already be outdated. Beside that it
also introduces a huge communication overhead which is difficult to be sustained. Therefore we want
to avoid the propagation of routing updates as much as possible. In order to overcome failure cases we
will use certain amounts of redundancy.
3. Constant Routing Table Size. The routing table size is an important issue in routing. Approaches
like Chord [8] use a routing table of size logN , where N is the size of the network. DNSR on the
contrary doesn’t define the routing table as a function of the network size but it rather uses a constant
table of size k (i.e. if a node can accept k connections it maintains information for those nodes only).
This both reduces the lookup time and the time to add/remove entries for nodes that join or leave.
4. Simplicity. The last and most important design issue is that we want to build a simple and efficient
routing algorithm. We believe that overlay networks can become difficult to analyze and predict if
the protocols are too complicated. Successful client-server protocols such as http [16], smtp [18] or
pop3 [17] own much of their success to their simplicity.
# Country Dom. IPs % # Country Dom. IPs %
1 Network .net 94, 456 38, 88% 11 Belgium .be 2, 527 1, 04%
2 US Commercial .com 81, 943 33, 73% 12 Italy .it 2, 038 0, 84%
3 Canada .ca 8, 039 3, 31% 13 Sweden .se 1, 532 0, 63%
4 France .fr 5, 565 2, 29% 14 Spain .es 1, 495 0, 62%
5 US Educational .edu 5, 102 2, 10% 15 Singapore .sg 1, 333 0, 55%
6 England .uk 4, 118 1, 69% 16 Switzerland .ch 1, 256 0, 52%
7 Germany .de 3, 693 1, 52% 17 Japan .jp 1, 089 0, 45%
8 Australia .au 3, 663 1, 51% 18 Norway .no 1, 010 0, 42%
9 Austria .at 2, 962 1, 22% 19 Brazil .br 775 0, 32%
10 Netherlands .nl 2, 625 1, 08% 20 New Zealand .nz 651 0, 27%
Table 2. Distribution of Gnutella IP Addresses to Domains.
2 Analysis and Experiences with the Gnutella Network.
In this section we describe some interesting network properties of the Gnutella Network that we obtained
with gnuDC in [4]. gnuDC is our large-scale distributed Gnutella Crawler. which allows us to obtain various
Gnutella network traffic metrics. Our analysis includes 294, 000 unique IP addresses that were gathered
from pong messages, routed through our system in June 2002, over a period of 5 hours. The gathered data
and observations initiated our effort in building and evaluating DNSR
gnuDC consists of several gnuBricks, which are gnutella clients that log various network activities,
and each gnuBrick maintains a local Hashtable of all IPs that it has seen before in order to avoid sending
duplicate IP addresses to the Index Server. The Index Server is responsible to filter out duplicate IP entries
since it maintains a global view of all IP addresses observed by the system.
After performing a reverse DNS lookup on the IP addresses that we gathered we ended up with a set of
244, 522 resolved IP addresses. An aggregate of 49, 478 or 16, 92% were not resolvable. We mention that the
non-resolvable set of IPs contain both hosts which where not reachable at the time of the resolution as well
as IP addresses which are allocated for private networks [19] (i.e. 192.168.x.x, 172.16.x.x and 10.x.x.x).
2.1 Domain Distributions of Gnutella Internet Hosts
In this subsection we investigated from which domains are the Gnutella users coming from. Clip2 [25]
reported in 2000 that Gnutella was a truly international phenomenon. Our measurements indicate that
although Gnutella has a worldwide audience most of its users come from only a few countries (i.e. U.S.A.,
Germany, Canada, France and England).
Overall Ranking of Organizations (ISPs)
# ISP Domain Country % # ISP Domain Country %
1 Road Runner rr.com US. 9, 43% 11 Adelphia Comm. adelphia.net US. 1, 73%
2 American Online aol.com US. 7, 49% 12 Wanadoo wanadoo.fr France 1, 67%
3 T-Online t-dialin.net Germany 6, 48% 13 Rogers Comm. rogers.com Canada 1, 62%
4 AT & T attbi.com US. 6, 02% 14 Woolworths Gr. co.uk England 1, 58%
5 Comcast comcast.net US. 3, 98% 15 ntl Group LTD ntl.com England 1, 34%
6 Cox Comm. cox.net US. 3, 35% 16 Verizon verizon.net US. 1, 27%
7 Shaw shawcable.net Canada 2, 30% 17 SBC Pacific Bell pacbell.net US. 1, 26%
8 Sympatico Lycos. sympatico.ca Canada 2, 15% 18 Verizon (DSL) dsl-verizon.net US. 1, 03%
9 CSC Holdings optonline.net US. 2, 09% 19 British Telec.. btopenworld.com England 1, 00%
10 BellSouth Telec. bellsouth.net US. 2, 00% 20 SBC Internet swbell.net US. 0, 94%
Table 3. Overall ranking of domains based on the number of hosts they contribute to the Gnutella Network.
Table 2 presents the top 20 domains from which Gnutella users are coming from. Although it was
expected that both .net and .com domains will dominate in this measurement, since these domains are
globally used by ISPs, we also found that the number of Gnutella users from various domains is more a
function of how advanced the networks of the various ISPs in these countries are rather than the actual
number of Internet users in these countries.
2.2 Internet Service Provider share of Gnutella Internet Hosts
Table 3 presents the overall ranking of ISPs based on their share of Gnutella Hosts they are contributing
to the Network. We can see clearly that US, Canadian, German, French and English organizations are
dominating the Gnutella network. This table shows that the largest part of the Gnutella network is occupied
by only a few countries. The table also reveals that Asian countries that have advanced networks, such as
Japan, are not particularly active in this community although their popular Napster-like File Rogue [24]
service was suspended.
The DSS group also verified that Gnutella is a truly international phenomenon, since one of three hosts
was found to be located on a non US-centric domain. Their study analyzed 3.3 million addresses, of which
1.3 million (39%) were resolvable to non-numeric hostnames. On this subset of addresses they found that
the ratio of domination was 19 : 8 : 2 : 1 for the following domains COM, NET, and EDU and combined
{ORG, US, GOV, and MIL} respectively.
3 DNSR - Domain Name Suffix-based Routing.
The DNSR protocol is a decentralized routing algorithm which has as a main objective to keep the traf-
fic generated by P2P applications within the same domain. In this section we will provide a technical
description of the algorithm and show how it can be deployed in a real setting.
3.1 Basic Notation
In the rest of this paper we will make use of the following conventions:
Network or Topology , denoted as N , consist of m hosts {n1, n2, ..., nm} inter-connected with some
topology (such as random or DNSR)
Degree of a node, denoted as di, is the number of connections a node ni can maintain at any given point.
di can be further divided into dini and douti , denoting the number of incoming and outgoing connections
respectively. We assume that d bounds the sum of din and dout in such a way that for a given d, din+dout = d
λ-suffix of a node, denoted as λi, is the ith largest suffix part of the DNS name of node. For a ex-
ample a node with a DNS of ”cs6368146-17.austin.rr.com” has a λ0 = ”com”, λ1 = ”rr.com” and
λ2 = ”austin.rr.com”. λl-suffix of a node, is the largest λ-suffix for a given DNS name. For the DNS
example cs6368146-17.austin.rr.com, the λl-suffix is equal to ”austin.rr.com”.
λ-similarity(dns1, dns2) , between two dns names dns1, dns2, is the largest match in the λ-suffices
of the parameters. For dns1 = ”cs6368146 − 17.austin.rr.com” and dns2 = ”othernode.aol.com”, λ-
similarity=0 since the two dns names match only on λ0. For dns1 = ”cs6368146− 17.austin.rr.com” and
dns2 = ”othernode.rr.com” λ-similarity=1 since the two dns names match only on λ1. If the two dns
names are completely irrelevant the λ-similarity=-1.
Sibling Factor (sfi) of a node, is the number of connections a node ni aims to maintain to its λl-suffix
match nodes. For a node ni=cs6368146-17.austin.rr.com, ni aims to maintain at any given point sfi of its
connections (either incoming or outgoing) to other nodes with the same λl-suffix match (i.e. austin.rr.com).
Parent Factor (pfi) of a node, is the number of connections a node ni aims to maintain to its λl−q-suffix
match nodes, ∃q ∈ [1..l). ni tries to maintain connections to parent nodes by having the value for q as small
as possible. This will yield a form of hierarchical topology which is something desirable in DNSR. For a
node ni=cs6368146-17.austin.rr.com, ni aims to maintain at any given point pfi of its connections (either
n i = cs6368146-17.austin.rr.com
n j = node-17.rr.com
sf i =0.8
pf i=cfj =0.1 d i =10
*.austin.rr.com
cfi =0.1
Fig. 4. Notation.
incoming or outgoing) to other nodes with the λl−1-suffix match (i.e. ”.rr.com”) or λl−2-suffix match (i.e.
”.com”) if the prior are not available.
Child Factor (cfi) of a node, is the number of connections a node ni accepts from its λl+q-suffix match
nodes, ∃q > 0. ni tries to maintain connections to parent nodes by having the value for q as small as
possible. For a node ni=cs6368146-17.austin.rr.com, ni aims to maintain at any given point cfi of its
connections to other nodes with the λl+1-suffix match (i.e. ”.subdomain.austin.rr.com”) or larger λ-suffix
match nodes if any prior node is not available.
The relation of Degree (di) and Parent/Sibling/Child Factor (pfi/sfi/cfi) . The purpose of
the pfi, sfi and cfi factors (pfi + sfi + cfii = 1), are to allow ni to determine how to allocate the di
connections to its peers. Whether these connections are incoming our outgoing is orthogonal since each ni
can anyway sustain at any given point only di connections. The three factors will allow a node to set an
order of preference to the connections it is establishing or accepting. In the current scheme we assume that
a node is creating outgoing connections to only parent and sibling nodes. A node is accepting incoming
connections from its children and sibling nodes only. To make the reading more understandable references
to all three factors (Parent, Sibling and Child) will be denoted as Level factors.
3.2 DNSR Topology
A DNSR topology is a semi-hierarchical topology where nodes having the same λl-suffix (i.e. sibling nodes)
are highly connected and connections to the parent or children layers are more sparse. This objective is
achieved by tuning the pfi, sfi and cfi factors. Since the objective of DNSR is to keep traffic within the
same domain we assign a large sfi value to ni such that sfi � pfi and sfi � cfi. In figure 4, we can see
that ni has sfi = 0.8, pfi = 0.1 and cfi = 0.1. Given that di = 10, ni aims to be connected to sfi ∗ di = 8
sibling nodes, pfi ∗ di = 1 child nodes and cfi ∗ di = 1 child nodes. Since ni has no children nodes it may
temporarily assign the particular slot to a sibling or parent node (preferably to sibling). In that way ni
Level 1 Level 1
Level 2
n1.ucr.edu
n1.cs.ucr.edu
n2.cs.ucr.edu
n3.cs.ucr.edu
n2.ucr.edu n6.ucsd.edu
n2.cs.ucr.edu
n2.cs.ucr.edu Level 2
n2.cs.ucsd.edu
n3.cs.ucsd.edu
n2.cs.ucsd.edu
DNSR Topology Instance d i = 3 -------------- pf i = d i / 3 sf i = d i / 3 cf i = d i / 3
Fig. 5. A snapshot of a DNSR Topology of 11 nodes. Each node has a degree d=3 and each node is launching
outgoing connections to 2d3
Sibling Nodes and d3
Parent Nodes.
achieves greater connectivity and potentially obtains a larger horizon.
A DNSR Topology snapshot for 11 hosts each of which having a degree of 3, can be seen in figure 5.
As we can see the upper levels of the DNSR topology have sparser connections among them and the leaf
nodes more dense connections. In the particular example the number of hosts is very small which therefore
does not allow us to illustrate the full potentiality of a DNSR topology. If on the contrary the topology
was larger and the sfi factor larger then we would be able to clearer observe that most of the connections
are among the same domain.
As we will see later in subsection 3.6, having such a topology a node can route messages in such a way
that most of the incurred traffic remains within the same domain.
3.3 Joining the DNSR Network
Let nj denote a P2P client which wants to join a DNSR network. Since nj doesn’t know which other
nodes are currently active in the network, nj has to consult a discovery service D, to obtain an initial list.
DNSR doesn’t specify the details of the initial discovery part. It assumes that an out-of-band discovery
service will provide nj with a random list L of active hosts L={nrand1, nrand2
, ..., nrandk}. The Discovery
Service D, might in fact be implemented in a similar way to techniques that are currently deployed in
P2P networks, such as Gnutella. In Gnutella two different techniques are deployed i) Discovery through a
HostCache [21, 22] and ii) Discovery by randomly probing nodes to which nj was connected in the past.
The role of D might in fact be extended in such a way that it provides nj with a ”selective” set of
hosts (i.e. hosts that match better the Level factor needs of nj . We nevertheless believe that the Discovery
service can be implemented in many different ways and that its exact operation depends on the service
that will utilize the DNSR protocol.
Having nj obtained a the random list L={nrand1, nrand2
, ..., nrandk}, DNSR requires nj to probe the k
random nodes for the best entry point(s). He does so by sending to all k nodes a message of the form:
nj: LOOKUP mynode.domainX.com
nrandi: +LOOKUPOK othernode.domainX.com
nrandi would find the ”best appropriate” node to nj ’s request by performing a Domain-Name Lookup
in the DNSR Network. The ”best appropriate” node denotes the node with the λl-suffix match for nj . The
lookup operation is described in further detail in the next subsection.
3.4 Domain-Name Lookup in the DNSR Network
We already described in the previous subsection that the Domain-Name Lookup is used for admitting a
node nj to the network. This would provide nj with the nodes that have a λl-suffix match with himself.
In order to achieve this each node ni receiving the lookup dns will chain the query by forwarding it to
only one of its connections with the largest λ-suffix match to the dns. If no such entry exists then ni can
forward the dns lookup to a random node.
From the example of figure 6 we can see that a lookup is initiated on nj ’s behalf and is then chained
through a number of intermediate hosts until we reach a host with the highest λ-similarity of the .cs.ucsd.edu
domain suffix we are looking for. In order to keep routing simple we deploy the following scheme. Each
time a node receives a lookup message it calculates the λ-similarity of himself and the predicate (nj , n4).
If the λ-similarity is -1 (no similarity at all) or 0 (similarity on λ0) then the lookup is forwarded to a
parent node. If on the contrary λ-similarity=λl (which means that the predicate is a sibling node) then the
lookup terminates and a response is sent back along the same path the lookup arrived. Finally if 0 < λ-
similarity< λl (which means that the predicate is a child node) then the query is forwarded downwards
until λ-similarity=λl at which point the query terminates.
Of course there is a possibility that λ-similarity never becomes equal to λl since a node with the same
λl suffix might not exist in the network. To cope with this problem the last node in the chain can either
simply return a LOOKUPOK message urging nj to join him (since the last node in the chain is anyway the
most appropriate node) or he might forward the query to one of its sibling nodes since they are equivalently
appropriate.
Level 1 Level 1
Level 2
n1.ucr.edu
n1.cs.ucr.edu
n2.cs.ucr.edu
n2 = n5.cs.ucr.edu
n3 = n2.ucr.edu n4 = n6.ucsd.edu
n3.cs.ucr.edu
n1 = n4.cs.ucr.edu Level 2
n2.cs.ucsd.edu
n5 = n3.cs.ucsd.edu
n4.cs.ucsd.edu
Domain-Name Lookup in DNSR Topology
Lookup
Overlay Connection
n j = n5.cs.ucsd.edu
Lookup Response
random
level up
level up
level down
Fig. 6. A Domain-Name Lookup in the DNSR topology is being used to find the node(s) that have the highest
λ-suffix match to the dns we are looking for.
3.5 Leaving a DNSR Topology
Leaving a DNSR topology does not require any form of a’ priori notification. Therefore nodes can leave
the network in an ad-hoc manner. It is expected that each node will try to keep its degree to some pre-
determined value di. Therefore if a node nj leaves, nj ’s neighbors must try to establish a connection to a
different host, keeping on the same time the Level factors at the right value. If for some reason a node is
not able to find appropriate nodes that will keep its Level factors at the pre-determined value then it may
temporarily keep them unbalanced until a more appropriate node is found.
Of course the problem is still how to find out new nodes in the network. Again a number of different
techniques can be deployed. A potential technique might be to exchange PING/PONG descriptors (like
Gnutella) and actively discover nodes for which a particular Level factor is not satisfied. If for example a
node needs to have 2 parent links and is able to find only 1 and may decide to send a PING descriptors to
its parent, in order to discover another parent which will be able to accommodate his connection request.
An alternative technique would be to repeat the procedure described in the joining phase, where a
node contacts a Discovery Service, obtains a random list and the performs a LOOKUP to find the most
appropriate entry point. In the context of this project we did not have adequate time to evaluate any of
the above techniques and therefore leave the issue open for some future work.
3.6 Searching in a DNSR Network
In the previous subsection we have shown how a node nj joins a DNSR topology. This allows to position
nj near to other nodes that have the largest λ-similarity to nj . In this subsection we will show how can nj
search the contents of other nodes. Fortunately the DNSR topology doesn’t restrict the search algorithm to
Level 2
Level 1 n1.ucr.edu
n1.cs.ucr.edu
n2.cs.ucr.edu
n2 = n3.cs.ucr.edu
n3 = n2.ucr.edu
n2.cs.ucr.edu
n1 = n2.cs.ucr.edu
Searching in DNSR Topology using BFS
QUERY
QUERYHIT
Fig. 7. A Search in the DNSR topology can be done with a number of different techniques. In this example we
are using for simplicity BFS. The important point to notice is that only a modest fraction of query messages make
their way through to a different level. (In this example sfi = 0.33).
be deployed on top of its topology. A number of different techniques such as Breadth-First-Search (BFS),
Random BFS or ISM[5] can be deployed. The bottom-line with all techniques is that bulk of the incurred
traffic will remain within the same domain since the sfi factor is set to a large value such that sfi � pfi
and sfi � cfi. Therefore only a modest amount of traffic will make its way to a different level of the
network. A node nj searching for some content sends a message of the form to some of its neighbors:
nj: QUERY some query
ni: +QUERYHIT IP, PORT
If the nodes deploy the BFS algorithm nj would send the query to all of its neighbors (parents, siblings
and children). The same would happen at each node that receives the query until a TTL (time-to-live)
parameter becomes zero. The TTL parameter, starts out by some constant value (e.g. 7) and is decreased
by one at each query forward. Therefore after 7 hops the query will terminate. The TTL technique is used
widely used in network applications. The important point in the context of DNSR is that at each forward
only a fraction of cfi + pfi messages at each forward will get their way to a different level.
The DNSR topology gives spaces for more sophisticated search techniques. One large-scale application
for instance may decide not to forward a query to any parent or children node at all, given that the query
might be satisfiable from only the sibling nodes. For example on table 1 we have shown that somebody is
able to find Gnutella approximately 1300 nodes of the .tampabay.rr.com domain itself. If these nodes were
interconnected with a DNSR topology then it would probably make sense to route messages to sibling
nodes only. Nevertheless we believe that a DNSR topology will be able to host different search techniques
based on the context they are used in.
4 Experimental Evaluation.
In order to test the applicability of the DNSR algorithm we would need to have access to a number of
hosts running on hundreds or thousands of sites (i.e. domains). Since this was not feasible in the context
of this project we decided to simulate the DNSR algorithm over the PeerWare Simulation Infrastructure
in our LAN. PeerWare [5] is our distributed middleware infrastructure which runs on a network of 50
workstations and which allows us to benchmark different routing algorithms for P2P networks. Probing
different query-routing algorithms over middleware P2P systems can be interesting from many points of
views:
1. In real settings the scalability of various query-routing algorithms may be explored to the fullest extend
since there are no assumptions which are typical in simulation environments.
2. Moreover many properties, such as network failures, dropped queries due to overloaded peers and
others may reveal many interesting patterns.
3. Finally, in a middleware infrastructure we are also able to capture the actual time to satisfy queryhits.
PeerWare consists of three main components:
1. graphGen - Network Graph Generator, which generates a network topology to be used for the simu-
lation. graphGen generates a number of files which contain initialization information for the various
dataPeers that will comprise a simulation.
2. dataPeer - The Data Node, which is a P2P client that answers to queries with queryhits if it meets the
search criterion. A dataPeer initializes a number of connections to hosts as these hosts are indicated
by graphGen
3. searchPeer - The Search-Node, which is a P2P client that submits a number of queries in a PeerWare
network and harvests the returned results. In contrast with dataPeer, searchPeer consists only of a
Network Module and a Result Logging Mechanism. Besides logging the number of results it also gath-
ers a number of other statistics such as the number of nodes answered to a particular query and the
time to receive the results.
In order to customize PeerWare to the needs of the DNSR algorithm we first extended graphGen and
generated two different types of topologies i) Random and ii) DNSR topology. For each of the topologies
we map a subset of DNS entries that we analyzed in section 2. These DNS addresses are mapped over the
# Domain Number of Nodes. Percentage # Domain Number of Nodes. Percentage
1 com 397 39% 6 au 18 1%
2 net 388 38% 7 be 13 1%
3 ca 35 3% 8 de 12 1%
4 edu 21 2% 9 uk 12 1%
5 fr 19 1% 10 at 9 0%
Table 4. Distribution of λ0 for the 1000 randomly sampled DNS entries. This table shows that the sampling from
the initial set of 294.000 IP addresses is accurate and captures the actual distribution (see table 2) of DNS entries.
# Domain Number of Nodes. Percentage # Domain Number of Nodes. Percentage
1 rr.com 109 10% 6 cox.net 31 3%
2 aol.com 87 8% 7 bellsouth.net 22 2%
3 t-dialin.net 69 6% 8 shawcable.net 22 2%
4 attbi.com 64 6% 9 sympatico.ca 22 2%
5 comcast.net 35 3% 10 optonline.net 17 1%
Table 5. Distribution of λ1 for the 1000 randomly sampled DNS entries. This table shows that the sampling from
the initial set of 294.000 IP addresses is accurate and captures the actual distribution (see table 3) of DNS entries.
physical IP addresses and port numbers of the hosts participating in a given simulation. In that way a
dataPeer can simulate a DNS name although its physical address is different, which therefore allows us to
simulate scenarios within our network of workstations.
4.1 Generating Simulation Topologies
In section 2 we presented an analysis of a set of ≈ 244.000 IP address gathered from the Gnutella network.
Since we are not able to simulate 244.000 DNS names due to shortage in PCs we decided to randomly
sample 1000 entries from the initial set.
Tables 4 and table 5 show that the distribution of λ0 and λ1 of the sampled hosts. The tables indicate
that the sampling is uniform and that it actually preserves the initial distributions of table 2 and table 3
respectively. For example in both the initial and the sampled sets λ1 for the .rr.com domain is ≈ 10%.
After obtaining the sampled set we generate two different topologies i) Random and ii) DNSR topology.
In the Random Topology we use an out-degree of 3. This would generate nodes who’s average degree is 6
(incoming and outgoing connections). We use the same degree value for the DNSR topology so that we will
be able to compare the topologies. graphGen generates a set of configuration files which can be read by the
Table 6. The myhosts.graph file for ”acbee1bf.ipt.aol.com” shows the outgoing connections that will be established
during initialization.
# UCR Random Graph Generator
# These are my settings
MYDNS = acbee1bf.ipt.aol.com
MYIP = 283-25.cs.ucr.edu
MYPORT = 10094
# Peers that I should connect to:
abbef1ef.ipt.aol.com = 283-22.cs.ucr.edu, 10707
bcdec1ba.ipt.aol.com = 283-21.cs.ucr.edu, 10720
cable-33-247.sssnet.com = 283-20.cs.ucr.edu, 10020
various nodes that comprise the simulation network topology. graphGen starts out by reading graph.conf,
which contains among others the following parameters:
1. Outdegree of a node, which is used in the case a random graph.
2. Topology of the P2P network (e.g. random graph).
3. IP List of hosts that will participate in a simulation. This allows us to map a logical topology (e.g.
Node1 -> Node10) to many different IP topologies
The output of graphgen is a directory of several myhosts.graph files (see table 6). Each file contains the
IP and port address of hosts to which a particular country must connect to. Each dataPeer ni reads upon
initialization a myhosts.graph file which contains the IP and ports of other dataPeer’s to which ni must
connect. Each dataPeer tries continuously to establish and maintain its outgoing connections. Therefore we
are not required to incorporate any topological sort algorithm. Connections among dataPeers are achieved
by the use of TCP Sockets and are persistent (they remain open until ni shuts down). If a TCP connection
goes down because of an overloaded peer then a node automatically re-establishes the connection after
some small interval.
4.2 Experiments
For the purpose of the experimentation we deploy 1000 dataPeers running on a network of 25 workstations,
each of which has an AMD Athlon4 1.4 GHz processor with 1GB RAM running Mandrake Linux 8.0 (kernel
2.4.3-20) all interconnected with a 10/100 LAN connection.
Obviously launching a large number of dataPeers on many different machines is a tedious procedure.
We have therefore constructed a set of UNIX shell scripts which automatically (by the use of ssh and
Random Topology
.com 47%
.net 45%
.edu 2%
.ca 4%
.fr 2%
Fig. 8. Distribution of QUERY messages reaching λ0
hosts in a Random Topology. The graph shows that
the distribution of hosts contacted is much like the
actual distribution of the hosts (see table 4)
DNSR Topology
com 85%
fr 4%
ca 7%
net 2% be 2%
Fig. 9. Distribution of QUERY messages reaching λ0
hosts in a DNSR Topology. The graph shows that
the distribution of hosts contacted is almost only from
the ”.com” domain. This shows that 85% of the traffic
remains in the .com domain.
public/private keys) connect to any number of machines and launch the dataPeers. Bringing up a PeerWare
Network of 1000 dataPeers, on 25 machines takes about a minute.
After the PeerWare network is brought up we connect to one host ni and search using BFS a total
number of 40 queries each of which with a TTL of 7. None of the hosts actually answers to any of the
queries as we are not interested the QUERYHIT messages. What we are interested in is the distribution of
hosts contacted by these queries. Therefore our evaluation metric for the experiments was the distribution
of hosts contacted by the use of a DNSR topology as compared to a Random topology. It is expected that
DNSR would keep the bulk of the traffic within the same domain yielding therefore distributions where
one particular domain receives most of the queries.
As we can see in chart pies 8 and 9 the distribution of QUERY messages reaching λ0 hosts in a Random
Topology is is much like the actual distribution of the hosts (see table 5). The Random topology does not
favor any particular domain and the distribution of hosts that will receive a QUERY message is clearly a
function of the actual distribution of hosts in the network. On the other hand the DNSR topology favors
the hosts that have the greatest λ − similarity with the ni host we initially connected to. This happens
because ni is expected to be connected favorably with hosts that have the latest possible λ − similaritywith him. In this case since ni belonged to the rr.com domain table 9 shows that 85% of the traffic affected
”.com” hosts.
In this pie charts 10 and 11 we can actually clearer see that the ”rr.com” hosts contacted in the DNSR
topology is 24% as compared to 11% in the Random topology. This number could be much greater, in the
Random Topology - ISP Level
Other Domains
64%
.rr.com 11%
.aol.com 10%
.t-dialin.net 8%
7% .attbi.com
Fig. 10. Distribution of QUERY messages reaching λ1
hosts in a DNSR Topology. The graph again shows
that the distribution of hosts contacted is much like
the actual distribution of the hosts (see table 5).
DNSR Topology - ISP Level
.aol.com 12%
.attbi.com 12%
.rogers.com 4%
Other Domains
48% .rr.com 24%
Fig. 11. Distribution of QUERY messages reaching λ1
hosts in a DNSR Topology. The graph shows that the
distribution of hosts contacted favors the ”.rr.com” do-
main (24%). The node that we submitted the queries
to belonged to the rr.com domain.
case of the DNSR topology, if the topology included more hosts and if the sfi factor was larger. In this
experiment the sfi factor was 0.6 because we used only 3 outgoing connection per hosts.
4.3 Implementation
The PeerWare infrastructure is implemented entirely in Java. Its implementation consists of approximately
10000 lines of code. For this project we had to extend PeerWare by adding dnsr.apps.* which consists of
the implementations of the DataNode and the QueryNode. The package contains approximately 1000 lines
of code. The DNSR networking core package dnsr.core.* contains actually a very few additions to the
initial PeerWare package set. In the DNSR we have added the feature to each node along a Queryhit path to
insert its identity into the QUERYHIT message so that the sender becomes aware of the path the QUERYHIT
message travelled. The package dnsr.graphgen.* is also an extended version of the initial PeerWare
graphgen package set. It consists of 2200 lines of codes and is able to generate both random and DNSR
graphs. Finally a prototype version of a hostcache implementation is included in dnsr.multiserver.*.
Its implementation uses connection pooling to increase the performance of the system.
Java was chosen for a variety of reasons. Its object-oriented design enhances the software development
process, supports rapid prototyping and enables the re-use and easy integration of existing components.
Java class libraries provide support for key features of PeerWare: platform independence, multithreading,
network programming, high-level programming of distributed applications, string processing, code mobility,
compression, etc. Other Java features, such as automatic garbage collection, persistence and exception
handling, are crucial in making our system more tolerant to run-time faults.
The choice of Java, however, comes with a certain risk-factor that arises from known performance prob-
lems of this programming language and its run-time environment. Notably, performance and robustness
are issues of critical importance for a distributed system like PeerWare, which is expected to run on several
machines and to sustain high-loads at short periods of time. In our experiments, we found the performance
of Java SDK 1.3 satisfactory.
5 Related Work.
5.1 The Domain Name Service (DNS)
The Domain Name Service protocol which is described in [20] is an application layer protocol that uses
UDP and which translates ”mnemonic” hostnames (e.g www.cs.ucr.edu) to their underlying numeric IP
address(es) (e.g. 138.23.169.15). At the core the protocol consists of a distributed database implemented as
a hierarchy of name servers. There are three different types of name servers: (i) Local Name Servers, which
are in charge of caching and maintaining the DNS to IP mappings so that they can serve clients of a given
Autonomous System, (ii) Authorative Name Servers which are registering DNS to IP mappings (there are
usually two name servers per IP) and (ii) Root Name Servers which forward DNS resolution requests from
Local Name Servers to Authorative Name Servers in the case the first do not know the mapping.
The success of the distributed DNS database is attributed in our opinion to the fact that (i) domain
names don’t change often, which therefore gives space for sparse DNS updates (which will reflect the
LNS’s caches) and (ii) that the name server hierarchy is assumed to be static (since DNS daemons run
on high end servers). In the context of overlay P2P networks both advantages of DNS are unfortunately
not applicable since we have a completely dynamic topology where hosts are joining and leaving at high
paces. Therefore designing a DNS-like hierarchy of nodes for an overlay network where nodes can locate
other nodes that belong to the same domain might not be efficient or even applicable.
5.2 Topologically-Aware Overlay Construction and Server Selection.
In [1] the authors present a binning scheme in which nodes partition themselves into disjoint ”bins”, such
that nodes that fall in the same bin are relatively close to each other in the network. Such a scheme,
like DNSR, is incorporated for performance optimization rather than correct operation. Their scheme is
simple and relies only on a small number of landmarks which are positioned at well known addresses in
the network. A node prior joining the network calculates the network latency (i.e. RTT) of himself and k
landmarks. The ascending ordering of the k landmarks derives the bin of a particular node. It is expected
that two nodes with the same or similar order of their landmarks are actually close to each other. For the
server selection process, a node would specialized DNS server that holds DNS to {IP,bin} mappings, would
return most appropriate entry. The idea of the specialized DNS server could be extended into a Hostcache
server that returns peers that are relatively close to the peer making the request.
A main disadvantage of the scheme is that it relies on the reliability of the landmarks. For example if
one or more landmarks decide to leave the network, because they might in fact be ordinary nodes, then the
bin maintained by the various peers is not preserved any more. These nodes then need to find out another
landmark which might become an expensive operation in dynamic topologies where nodes join and leave
at a high pace.
5.3 Narada and End System Multicast
The Narada application layer multicast protocol is described in [2]. The main objective of the protocol is to
make Multicast successful by moving its logic from the Network layer to the Application layer. In that way
the protocol does not rely on the intermediate routers. The described protocol initially constructs a richer
connected graph, denoted as mesh, and then uses some mesh optimization algorithm to generate a mesh
that has certain performance properties. More specifically they attempt to ensure that (i) the shortest
path delay between any pair of members along the mesh is at most K, where K is a small constant and (ii)
that each member has a limited number of neighbors in the mesh. Narada uses mechanisms for member
joins, leaves and failures ensuring that the mesh is kept connected and that the mesh quality is improved
over time.
On top of the self-improving mesh, narada runs a Distance Vector Routing (DVR) algorithm to achieve
data delivery to group members of a multicast group. The metric used in the DVR algorithm is the la-
tency between neighbors. Their protocol is simulated over a a number of different types of topologies as
well as over a real setting of 13 hosts which are geographically distributed throughout the United States.
Their prototype system shows that the generated overlay spanning tree is matching the actual underlying
physical topology.
The main difference between Narada and DNSR are the following:
1. Narada is a multicast protocol for overlay topologies while DNSR is a variation of a broadcast protocol
where messages are routed to hosts which have the longest suffix domain name match.
2. DNSR is targeted for large ad-hoc communities where nodes join/leave at high paces.
3. As part of the mesh quality improvement algorithm, narada nodes randomly probe each other and
calculate the perceived gain in utility. In DNSR on the other hand this costly procedure is avoided
since nodes are already assumed to be connected to the best peers (i.e. peers within the same domain).
5.4 Connecting to Semantically Similar Nodes.
Semantically clustering nodes provides a different direction in P2P optimization. The parameter optimized
is the user satisfaction as a function of the quantity and quality of the returned results. Techniques such
as [13, 14] present an altogether different philosophy and are not directly comparable to DNSR since the
later optimizes the network efficiency parameter. In fact the network efficiency and the user satisfaction
criteria might be conflicting and the tradeoff between these two parameters inevident.
6 Conclusions.
In this work we propose and evaluate DNSR (Domain Name Suffix-based Routing), which is a novel
technique to route query messages in Overlay Networks, based on the ”domain closeness” of the node
sending the message. We describe DNSR and show simulation experiments which are performed over
PeerWare, our distributed infrastructure which runs on a network of 50 workstations. Our simulations are
based on real data gathered from one of the largest open P2P networks, namely Gnutella.
The experiments show that the idea of Domain Name Suffix-base routing of messages in Large Scale P2P
communities is highly applicable and that ISPs and corporations have to benefit a lot by the deployment
of such a scheme. The DNSR scheme is a simple technique which is expected to behave in the worse case
like a Random Topology.
In the future we plan to more carefully investigate the integration of DNS routing updates at the
highest level of the DNSR topology. Such a feature may provide nodes with the ability to more easily
locate domains in the case of Lookup queries. We are also interested in deploying a larger simulation
over the PlanetLab [28] distributed overlay testbed which is expected to run over 1000 geographically
distributed machines in the next 2 years.
References
1. Sylvia Ratnasamy, Mark Handley, Richard Karp, Scott Shenker Topologically-Aware Overlay Construction and
Server Selection. Proceedings of IEEE INFOCOM’02, 2002
2. Yang-hua Chu, Sanjay G. Rao and Hui Zhang ”A Case For End System Multicast”, Proceedings of ACM
SIGMETRICS, Santa Clara,CA, June 2000, pp 1-12.
3. Arturo Crespo and Hector Garcia-Molina, ”Routing Indices for Peer-to-peer Systems”, In ICDCS, 2002.
4. D. Zeinalipour-Yazti and T. Folias, ”Quantitative Analysis of the Gnutella Network Traffic”, Dept. of Computer
Science, University of California, Riverside, June 2000
5. V. Kalogeraki, D. Gunopulos and D. Zeinalipour-Yazti ”A Local Search Mechanism for Peer-to-Peer Networks,
”11th International Conference on Information and Knowledge Management (CIKM’2002) , McLean, Virginia
USA, November 4-9, 2002
6. Francisco Matias Cuenca-Acuna and Thu D. Nguyen. ”Text-Based Content Search and Retrieval in ad hoc P2P
Communities”, International Workshop on Peer-to-Peer Computing, Springer-Verlag, May 2002
7. B. Yang, H. Garcia-Molina, Efficient Search in Peer-to-Peer Networks. Proc. Int. Conf. on Distributed Com-
puting Systems, 2002.
8. I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan. Chord: A scalable peer-to-peer lookup service
for Internet applications. Proc. of ACM SIGCOMM 2001.
9. A. Rowstron and P. Druschel, ”Pastry: Scalable, distributed object location and routing for large-scale peer-to-
peer systems”. IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidel-
berg, Germany, pages 329-350, November, 2001
10. Miguel Castro, Peter Druschel, Y. Charlie Hu, and Antony Rowstron ”Topology-aware routing in structured
peer-to-peer overlay networks” IFIP/ACM International Conference on Distributed Systems Platforms (Mid-
dleware), Heidelberg, Germany, pages 329-350, November, 2001
11. Marcel Waldvogel and Roberto Rinaldi ”Efficient Topology-Aware Overlay Network” ACM Computer Com-
munication Review, Vol.33, No: 1, January 2003
12. Venkata N. Padmanabhan and Lakshminarayanan Subramanian ”An Investigation of Geographic Mapping
Techniques for Internet Hosts” Proceedings of ACM SIGCOMM 2001, San Diego, CA, USA, August 2001
13. M. K. Ramanathan, V. Kalogeraki and J. Pruyne ”Finding Good Peers in Peer-to-Peer Networks” International
Parallel and Distributed Computing Symposium (IPDPS), Fort Lauderdale, Florida (April 2002)
14. Arturo Crespo and Hector Garcia-Molina ”Semantic Overlay Networks” Stanford University
15. J. Moy Open Shortest Path First (OSPF) v2.0 Request for Comments: 2328, Network Working Group, April
1998
16. J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach and T. Berners-Lee Hypertext Transfer Protocol
(HTTP/1.1) Request for Comments: 2616, Network Working Group, June 1999
17. J. Myers and M. Rose Post Office Protocol - Version 3 (POP3) Request for Comments: 1939, Network Working
Group, May 1996
18. Jonathan B. Postel Simple Mail Transfer Protocol (SMTP) Request for Comments: 821, Network Working
Group, August 1982
19. Y. Rekhter, B. Moskowitz, D. Karrenberg, G. J. de Groot, E. Lear, ”RFC1918 - Address Allocation for Private
Internets”, February 1996.
20. P. Mockapetris Domain Names - Implementation and Specification Request for Comments: 1035, Network
Working Group, November 1987
21. GWebCache, http://www.zero-g.net/gwebcache/specs.html.
22. GNetCache, http://sourceforge.net/projects/gnetcache/..
23. ”The SETI@home (Search for Extraterrestrial Intelligence at Home) Project”, UC Berkeley,
http://setiathome.ssl.berkeley.edu/.
24. File Rogue, File Rogue Inc. http://www.filerogue.com/.
25. Clip2. , Clip2.com, http://www.clip2.com/.
26. Gnutelliums, Gnutella, http://www.gnutelliums.com/.
27. Groove, Groove Networks Inc. http://www.groove.net/.
28. PlanetLab ”An open testbed for developing, deploying, and accessing planetary-scale services.”
http://www.planet-lab.org/.
29. Napster, Napster.com, http://www.napster.com/.