Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
2009:015 CIV
M A S T E R ' S T H E S I S
REDS- Redundant and Expandable Distributed file Storage
system for a serverless network
Paula Arenas Lindmark
Luleå University of Technology
MSc Programmes in Engineering Computer Science and Engineering
Department of Computer Science and Electrical EngineeringDivision of Computer Communication
2009:015 CIV - ISSN: 1402-1617 - ISRN: LTU-EX--09/015--SE
Redundant andExpandableDistributed fileStorage system for a serverless network
Paula Arenas Lindmark
Lule̊a University of TechnologyDepartment of Computer Science and Electrical Engineering
September 2008
ABSTRACT
Peer-to-peer (P2P) storage systems are an interesting and emerging field, providing new
possibilities for distributed applications. P2P storage systems allow a network of col-
laborating nodes to increase the availability of their data by store replicas of the data
on other nodes in the network. Reasons they are preferred over traditional client-server
systems include fault tolerance, availability and scalability. Despite their significant po-
tential, current peer-to-peer storage systems lack in their defense against cheating nodes
who attempt to use more storage space than they provide.
This thesis addresses this deficiency and presents ”REDS - Redundant and Expandable
Distributed file Storage system for a serverless network”, which has a novel approach to
overcome the problem with cheating nodes who tries to falsify information in a peer-to-
peer storage system. REDS uses a ranking system based on different kinds of requests to
locate and suspend malicious nodes in the system. A working prototype of the proposed
system has been implemented in Java on top of the FreePastry implementation of the
Pastry routing layer. Furthermore, a graphical user interface for the prototype has been
implemented. Simulation results, based on freepastry’s own simulator, indicate that
REDS scales well and is able to efficiently support a large number of nodes.
iii
PREFACE
This Master’s Thesis is the final part of my Master of Science in computer Science
and Computer Communication at Lule̊a University of Technology. The project was de-
veloped at the Department of Computer Science at Colorado University at Boulder, USA.
Many people have contributed to the realization of this Master’s Thesis. First of all I
would like to thank my supervisor at the Department of Computer Science at The Uni-
versity of Colorado at Boulder, Dr. Douglas C. Sicker, for making this Master’s Thesis
possible and for all the help during the development of this project.
Furthermore I would like to thank Mr. Kevin S. Bauer at the Department of Computer
Science at University of Colorado at Boulder, for help and advice regarding the theoret-
ical aspect of anonymous networks. My thanks also go to my examiner, H̊akan Jonsson
at the Department of Computer Science and Electrical Engineering at Lule̊a University
of Technology.
I would also like to thank Staffan Backen at the Department of Computer Science and
Electrical Engineering at Lule̊a University of Technology, for contributing with valuable
ideas and feedback throughout the work procedure and also for standing by my side and
having faith in me, you have been invaluable.
A special thanks goes to my family who has been a source of constant support and has
always been there when I needed them. Finally, I would like to thank my friends Jessica
Hilb and Suzanne Short in Boulder, USA, for making my stay in Boulder really great.
Paula Arenas Lindmark
v
CONTENTS
Chapter 1: Introduction 1
1.1 Background and problem description . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 2: Anonymous Networks 3
2.1 David Chaum’s Mix Network . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Onion Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Tor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Cashmere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Anonymous Remailers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Measuring Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.7 Attacks against Anonymous Networks . . . . . . . . . . . . . . . . . . . . 6
2.8 Users of Anonymous Networks . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 3: Peer-to-peer Systems 9
3.1 Description of P2P Systems . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Napster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Gnutella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 BitTorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 MorphMix, A P2P Anonymous Communications System . . . . . . . . . 11
3.6 Freenet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 4: Peer-to-peer Storage Systems 13
4.1 OceanStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Eternity Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 PAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4 Scribe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5 Anonymity in P2P Storage Systems . . . . . . . . . . . . . . . . . . . . . 15
Chapter 5: REDS 17
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 REDS Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 6: Conclusions and Future Work 35
CHAPTER 1
Introduction
1.1 Background and problem description
The need and use of large-scale distributed storage has rapidly increased in the last few
years, much research has therefore been done in this area. Storage systems have tradi-
tionally been a localized task requiring specific backup hardware and administration. In
the early 1960’s, the first backup traditions and strategies started to arise. The fact that
tape backup is very reliable and scalable makes it an attractive solution even today, but
its high cost is a big disadvantage.
The first hard drive was introduced by IBM [1] in 1956. Over the years, a rapid im-
provement of the hard drive technology has been made, while the cost significantly has
decreased. An important event was the introduction of the redundant array (RAID) of
inexpensive disks technology [2]. This data storage scheme uses multiple hard drives to
share or replicate data among them. In 1969 the first floppy disk was introduced and was
considered as revolutionary media for transporting data from one computer to another.
They could not store as much data as hard drives, but, being much cheaper and more
flexible, they became very widespread. However, the floppy disk had a very low capacity
and was replaced by the next step in the development of storage media, the CD and
DVD. Hard drive space has rapidly increased over the years, and studies suggest that
the majority of the hard drive space today often remains underused.
Peer-to-peer (P2P) systems is an exciting and emerging field and has in recent years
become one of the most popular Internet applications. This has lead to a significant
growth in the research of P2P systems. The novelty of these systems is such that the
traditional client-server network is replaced by a decentralized network where peers play
1
2 Introduction
the part of both client and server. P2P systems like Napster [3], Gnutella [4] and Bit-
Torrent [5], just to name a few of them, are often associated with illegal file sharing.
However, P2P systems have many more uses and applications than simply enabling ille-
gal file sharing, such as offering a decentralized, scalable, self sustained and fault tolerant
network of peers providing access to some of their resources. As a result of peers joining
and leaving the network, ensuring high availability of the stored data is an interesting
and challenging problem, as well as preventing ”freeloaders” who use disproportionately
more storage on other peers than they contribute to the network.
Anonymous networks is a large and growing field of research and used to preserve the
anonymity and privacy during communication over public networks. Technological sup-
port for anonymity has been the subject of much research, starting with Chaum’s seminal
paper [6]. Recently, much of the research has been focused on P2P anonymous systems.
P2P systems are designed to scale well, which is an advantage for anonymous systems
that benefit from large user populations.
Instead of purchasing extra hardware or leasing online storage, the unused gigabytes
of hard drive space could be utilized in order to store replicas of data within a collabo-
rating group of networked peers. Though a large amount of research has already been
invested in P2P Storage systems, a successful file storage system has yet to be found.
This master’s thesis will present the design and implementation of ”REDS - Redundant
and Expandable Distributed file Storage system for a serveless network”, a new P2P file
Storage system, which inherits characteristics from previous storage-, P2P- and anony-
mous systems. REDS will try to overcome the problems with cheating nodes using an
innovative ranking system.
1.2 Thesis outlineIn section 2 anonymous networks are introduced and some well-known anonymous sys-
tems are described. Section 3 briefly defines what P2P is and then presents the basic
concepts of well-known P2P systems. Section 4 presents major research of distributed
P2P storage systems. In Section 5, the design and implementation of REDS is presented,
as well as an analysis of the performance of REDS.
CHAPTER 2
Anonymous Networks
The importance and concern about preserving the anonymity and privacy during com-
munication over public networks has increased, caused by the expanding online world.
With this in mind, there is a need for solutions to ensure the user’s anonymity by hiding
the identity of the user during for example online payments and voting, but also for
email and web browsing. Much effort has therefore been expended to achieve anonymity
in these kinds of electronic interactions, which has led to the development of different
protocols and applications for anonymous communication.
2.1 David Chaum’s Mix Network
The idea of anonymous communication can be traced back to David Chaum, who in
1981 proposed a system for anonymous email [6]. The proposed system was using a
computer proxy, called a Mix, for the processing of emails. The Mix, located between
the sender and the receiver, used public key cryptography to act as a clearinghouse for
anonymous emails. To send an email with Chaum’s Mix, the sender must build a specially
formatted message for transmission to the Mix. While the encapsulated and encrypted
content of the email remains unchanged, the source and destination addresses as well
as the cryptographic keys are specific for each Mix. Each Mix only knows the identity
of previous and next hop in the route, which makes it difficult to identify the complete
communication route. Chaum’s Mix network further improves anonymity by processing
messages in batches at random intervals. By using a random delay, coincidence and
timing attacks are made more difficult.
3
4 Anonymous Networks
2.2 Onion Routing
Goldschlag, Reed, and Syverson developed a technique for anonymous communication
over a computer network, called Onion Routing [7]. This technique is based on Mix net-
works, but was extended to work with real time communications such as web browsing.
Onion Routing is an infrastructure for private communication over a public network.
The system works in the following way: encrypted messages, or packets of information,
are sent through a distributed network of randomly selected servers, Onion Routers,
each of which knows only its predecessor and successor. Messages flowing through the
network are unwrapped by symmetric encryption keys at each onion router that peels
off one layer and reveals instructions for the next downstream node. The content of the
message can only be revealed by the final receiver. An onion is a data structure, treated
as the destination address by the onion routers. Onion routing does not guarantee perfect
anonymity, but it helps protect users from eavesdroppers who are not watching both the
initiator and recipient of the message at the time of the transaction. It is also strongly
resistant against traffic analysis.
2.3 Tor
In [8], Tor: The Second-Generation Onion Router, is described as a circuit-based low la-
tency anonymous communication service that addresses limitations in the original design
of onion routing. This is done by adding forward secrecy, congestion control, directory
servers, integrity checking, configurable exit policies, and practical design for location-
hidden services via rendezvous points.
Tor can anonymize applications that use TCP like web browsing and publishing, Instant
Messaging (IM), Internet Relay Chat (IRC), Secure Shell (SSH) etc. The main goal of
the Tor project is to stop traffic analysis, a network monitoring system that threatens
anonymity and privacy. Tor uses TCP for transport and Transport Layer Security (TLS)
[9] for encryption. In the design of Tor, as described in [10][8][11], there are some key
elements; Onion Router (the server component of the network), Onion Proxy (The client
part of the network), and a circuit, which is a path of three onion routers through the
Tor network from the onion proxy to the destination server. The first onion router is
called the entrance router, the second router is called the mix router and the final router
is called the exit router. Finally, the unit of transmission through the Tor network is
called a cell, which is a fixed-size 512 byte packet, that is padded if necessary. Each onion
router maintains two keys, one long-term identity key used to sign TLS certificates and
to sign the Onion Router’s router description (a summary of its keys, address, bandwidth
etc.), and the onion key is used to decrypt requests from users to setup a circuit and
negotiate ephemeral keys. This approach improves forward secrecy by eliminating the
2.4. Cashmere 5
risk that nodes along the circuit could be compromised and forced to decrypt captured
traffic. Once a circuit is torn down, they keys are discarded, thus also offering protection
from replay attacks. There is also a short term link key, established by the TLS protocol,
when communicating between Onion Routers. Tor uses the Tor Authentication Protocol
(TAP), described and analyzed in [12]. Bauer and McCoy makes in their paper the
conclusions that TAP is secure in a random oracle model. This means that there is only
a small chance of success for a man-in-the-middle attack.
2.4 CashmereCashmere [13] is a MIX-based failure resilient anonymous routing layer implemented on
structured overlays. Cashmere enables anonymous routing, and provides both source
anonymity and unlinkability of source and destination. The key benefit of Cashmere
over traditional approaches is that it provides an increased resilience to node failures and
node churns which generally degrades the performance of traditional anonymous routing
protocols based on Chaum-Mixes. Traditionally Chaum-Mixes based routing protocols
achieve anonymity by relaying the traffic through a sequence of nodes, such that any
two nodes, which are not adjacent to each other along the path, are unable to identify
each other. Thus, if the relayed path contains more than two nodes, then there is no
way the destination can identify the source. More specifically, no downstream node can
identify the upstream nodes. Such anonymous routing has several weaknesses, and the
most important one is that if any node along the route fails, or misbehaves, then the
message is never delivered to the destination. Moreover, in an anonymous setting, it
is difficult for the source to identify such failures, and identify which node has failed.
Another problem is that, if an attacker can observe all routers, then he can use timing
analysis to identify the route from the source to the destination. Cashmere presents a
novel anonymous routing protocol, which attempts to address these problems with the
traditional onion routing.
2.5 Anonymous RemailersA remailer is a server that receives messages and forwards it without revealing the identity
of the sender. Remailers are not anonymous networks but are still interesting, especially
when considering that some of the techniques used for the anonymous networks mentioned
above are picked up by remailers. There are four different kinds of remailers, with each
offering different functionality.
1. Cyperpunk remailers removes the senders identity from a message, which will make
it impossible to reply on a message.
2. Mixmaster remailers sends a message in small fixed size packets and switch the
order of the packets, this will make it harder for potential traffic sniffers.
6 Anonymous Networks
3. With Mixminion remailers, the messages is sent through several mix-nodes. The
mix-nodes decrypts the messages and mix them before forwarding the messages.
Like Onion Routing, the messages are being encrypted in layers with asymmetric
encryption.
4. Pseudonymous remailers gives the sender a pseudonym, an alias, which will hide
the real identity of the sender but still makes it possible for the receiver to reply
on the message.
2.6 Measuring Anonymity
Measuring the actual anonymity a certain system provides can be hard. Nevertheless,
several papers have been published on how to quantify the degree the anonymity a user
provides from a anonymous communication system. One definition of the degree of
anonymity, proposed by Reiter and Rubin [14], is as 1 − p, where p is the probability
an attacker, after observing the system, assigns to the different users of the system as
being the originators of a message. Berthold et.al [15] defines the anonymity as log(2N)
where N is the number of users of the system. However, both definitions are rejected
by Diaz, Seys, Claessens, and Preneel [16]. They claim the first definition does not
give information about how distinguishable a user is within the anonymity set. Their
comment to the second definition is that this degree only depends on the number of users
in the system, and does not take into account the information the attacker may obtain
by observing the system. Instead they propose a method that measures the information
the attacker obtains, taking into account the whole set of users and the probabilistic
information the attacker obtains about them.
2.7 Attacks against Anonymous Networks
Overlier and Syverson [17] presented an attack against Tor that revealed the location of
hidden servers. This was done using only a single hostile Tor node and was accomplished
in just a couple of minutes. According to Overlier and Syverson, their results also apply to
other anonymous networks. The predecessor attack presented in [18], where the attacker
controls a subset of the nodes in the anonymous system to passively log possible initiators
of a stream of communications, was used to statistically identify the hidden service as
the server that appeared on these linked paths most often. The Sybil attack [19], which
applies to all open peer-to-peer systems, demonstrates that an attacker has the ability
to control a significant fraction of the network. Both the Tarzan and Morphmix peer-to-
peer anonymous communications systems, however, implement a partial defense. In [10],
“Low-Resource, End-to-End Anonymity Attacks Against Tor ”, is presented. This attack
shows that Tor is vulnerable to attacks from non-global adversaries that control only a
few high resource nodes, or nodes that are perceived to be high-resource. By controlling
2.8. Users of Anonymous Networks 7
both the start and end node, it is possible to get to know who is communicating with
whom through the Tor network. In that paper, a few defense methods are presented as
well.
2.8 Users of Anonymous NetworksAnonymity can of course be used for various purposes, both good and bad. Anonymity
allows people to express their views freely, without the fear of repercussions. As an exam-
ple, people in countries with a political regime may use anonymity to avoid persecution
for their political opinions. When being anonymous in discussions, people are more equal,
factors like status, gender, etc., will not influence the evaluations of what has been said.
There has always, however, been a dark side of anonymity and this does, unfortunately,
not exclude anonymity on the Internet. Criminals want of course to have anonymity so
that they can break the law without revealing their identity. However, criminals practi-
cally have anonymity already and the likelihood of these individuals committing crimes
is probably not tied to the availability of any one single technology. Technologies for
anonymity are a way for ordinary people to preserve their anonymity and privacy. One
way of keeping the identity hidden and protected from not only criminals and potential
harassments, but also from unwanted marketing and junk emails. An Expectation Maxi-
mization (EM) algorithm for distribution reconstruction is described and analyzed in [20],
as a solution to the issue of privacy preservation. The Government can also benefit from
anonymous networking for many reasons. For example using it to create anonymous tip
lines for whistle blowers, and anonymous means for citizens to submit anonymous leads
in criminal investigations.
CHAPTER 3
Peer-to-peer Systems
Peer-to-peer systems is an exciting and emerging field and has in recent years become one
of the most popular Internet applications. Reasons they are preferred over traditional
centralized systems includes fault tolerance, availability, scalability and performance.
P2P networks are widely used to share all kinds of digital content with other users, both
legal and illegal content (i.e. copyrighted or subject to censorship). This chapter provides
an overview of P2P systems.
3.1 Description of P2P SystemsThe idea behind P2P networks is to have a system that enables end-point resources to
be shared. The resources can be of various kinds such as files, storage space, CPU cycles
etc. The defining feature of a P2P system, is the ability of end systems (ie. peers) to
communicate with each other. In this sense, a P2P system can be viewed as an overlay
network over the Internet. According to [21], there are three criterias that defines a P2P
system:
• Self-organizing: Nodes organize themselves into a network through a discovery pro-
cess. There is no global directory of peers or resources.
• Symmetric communication: Peers are considered equals ; they both request and offer
services, rather than being confined to either client or server roles.
• Decentralized control : Peers determine their level of participation and their course
of action autonomously. There is no central controller that dictates behavior to
individual nodes.
The P2P approach existed before file sharing systems like Napster, Gnutella and Freenet
made the idea of P2P popular. One of the first P2P systems, the Internet (Arpanet),
was based on the P2P approach.
9
10 Peer-to-peer Systems
3.2 Napster
In 1999 Napster [3], the best-known P2P system, was created. The idea behind Napster
was a program that allowed computer users to share and swap files, specifically music,
through a centralized file server. Napster has a server holding a central database of all
files available by the connected peers (clients). Clients login to this server and sends a
list of files they offer. To be able to use Napster, each client has to create an account
on the Napster server for this purpose. After creating an account, the clients can send
file requests to the server and receive a list of clients, on where the matching file exists.
The requester can then choose clients from this list and request to download the file
directly from these clients. When considering that Napster is using a server to hold the
information of all files, you can say that it is not a pure P2P system. Napster is actually
a combination of a client/server and P2P system, where the client/server technology is
used for the initiation and file request and the P2P technology for downloading of files.
Since Napster uses a centralized database, it avoids many of the problems of other P2P
systems regarding query and routing. The original Napster was sentenced to go out
of business after facing legal issues with the music industry. Napster was taken over
by Bertelsmann and was remodeled to be a pay-service, but has not gained its original
popularity.
3.3 Gnutella
Gnutella [4] is a decentralized file-sharing system, where the participants forms a virtual
network communicating in a P2P fashion via the Gnutella protocol [4], which is a sim-
ple protocol for distributed file search. The first step for participating in Gnutella is to
connect to a known Gnutella host. The Gnutella protocol consists of five basic message
types: Ping, Pong, Query, QueryHit and Push. The messages are routed by the peers
using a constrained broadcast mechanism: when a peer receives a message, it will decre-
ment the message’s time-to-live (TTL) field. If the TTL is greater than 0 and it has not
seen the message’s identifier before, it resends the message to all peers it knows about.
Additionally the peer checks whether it should respond to a message: if the peer receives
a Query message, it checks to see if it can satisfy the request and then responds with a
QueryHit message. The respond is routed along the same path as the originating message.
Gnutella is a simple yet effective protocol. Hit rates for search queries are reasonably
high, it is fault tolerant, and is adaptable to a dynamically changing in the P2P system
topology. However, Gnutella is a system that comes with high bandwidth consumption
since the search requests are broadcasted over the network and each peer receiving a
search request are doing a local database scan to see if it can satisfy the request.
3.4. BitTorrent 11
3.4 BitTorrent
BitTorrent [5] is a file transfer protocol developed by Bram Cohen and implemented in
a client/server package of the same name. Using this software it is possible to distribute
files over unreliable networks. The transfers are coordinated by special text files called
torrent files, which help the BitTorrent clients coordinate the peer-based transfers. This
means that by connecting to other BitTorrent users it is possible to upload and download
a common file using the combined bandwidth of all users.
In BitTorrent, a sharing file is divided into multiple small chunks. A user can down-
load different chunks concurrently from multiple users, and at the same time upload its
holding chunks to other BitTorrent users. This means that the users does not need to
have the complete file of what is being transferred, but everyone who has even a small
piece of it may participate in the transfer. Each peer is responsible for maximizing its
own download rate by connecting to suitable peers, and peers that supplies high upload
rate will probably also be able to get a high download rate. When a peer has finished
downloading a file, it may become a seed by staying online and make it able for other
peers to download the file.
3.5 MorphMix, A P2P Anonymous Communications Sys-tem
Mix-based systems for anonymous communication on the Internet gives the users protec-
tion against eavesdropping and traffic analysis. These systems can suffer from scalability
problems, with potentially large bandwidth and system overhead costs, and the servers
themselves can be targets of an attack. These problems have been addressed by Rennhard
and Plattner in [22]. In their paper, they introduce a Peer-to-Peer network for anonymous
communication called MorphMix. In this system, as well as in Tarzan [23] and Freenet
[24], every client is also a proxy at the same time. This makes MorphMix scale very well
and makes it the first system that enables anonymous low-latency Internet access for a
large number of users. MorphMix allows users to tunnel out through low-latency net-
work and includes some interesting collusion detection algorithms. The most significant
difference compared to traditional Mix-based systems, is that because of its network size,
MorphMix does not have to send cover traffic to prevent traffic analysis and therefore, it
will be less overhead.
3.6 Freenet
Freenet [24] is a P2P system for the publication, replication and retrieval of data files. Its
central goal is to provide an infrastructure that protects the anonymity of authors and
readers of the data. It is designed in a way that makes it infeasible to determine the origin
12 Peer-to-peer Systems
of files or the destination of requests. It is also difficult for a node to determine what
it stores, since the files are encrypted when they are stored and sent over the network.
As Freenet can potentially contain illegal content, it provides deniability that the owner
of the node knows nothing of what is stored on the node, due to the encryption that
Freenet provides. Thus following the reasoning of the designers of Freenet, no user can
be sued for storing illegal content. Besides the aspect of anonymity protection the Freenet
system implements another interesting concept: an adaptive routing scheme for efficiently
routing requests to the physical locations where they are most probable to appear. In
order to improve search efficiency Freenet maintains routing tables that are dynamically
updated as searches and insertions of data occur. Freenet also uses dynamic replication
of more popular files such that files can migrate to peers where they are more likely to
be found. Thus Freenet does not require a central server as Napster, and compared to
Gnutella Freenet avoids inefficient message broadcasts.
CHAPTER 4
Peer-to-peer Storage Systems
Traditional storage systems rely on robust servers and magnetic tapes where the data
are stored. This is a reliable storage solution, but expensive and hence mostly aimed
for companies. The growth of storage volume, bandwidth and the greater demand from
private persons has fundamentally changed the way applications are constructed and has
led to a new type of storage systems that use distributed P2P infrastructures. Compared
to traditional storage systems, these systems are inexpensive but poses problems of reli-
ability, confidentiality, availability, routing and more. This chapter briefly presents some
well known distributed storage systems.
4.1 OceanStore
OceanStore [25] is a proposed system for an Internet-based, distributed, global storage
infrastructure. It consists of many cooperating servers provided by different companies.
OceanStore is designed using a cooperative utility model in which consumers pay a fee to
the service providers to ensure access to persistent storage. The OceanStore is a storage
utility comprised of untrusted servers, therefore it uses encryption when storing data in
the network. The data is split up in fragments and can redundantly be stored anywhere in
the network, which provides high availability and prevention of denial-of-service attacks.
Files are uniquely identified by a Global ID (GUID). For the location of files, OceanStore
uses either a nondeterministic but fast algorithm or a deterministic, slower algorithm. To
achieve high fault tolerance, self-monitoring is provided, a mechanism that continually
monitors and repairs neighbor links. OceanStore uses ACL for restricting write access to
data, while read access is available with a key.
13
14 Peer-to-peer Storage Systems
4.2 Eternity Service
The basic idea behind The Eternity Service [26] is the use of redundancy and distribution
techniques for the replication of data over a large set of nodes. An anonymity mechanism
is also used to try to prevent selective denial attacks.
The user uploads data along with a requested file duration. The user has to pay for
this service and the cost are based on size of data and the desired duration. When a
user uploads data to for example 100 servers, the user only has to remember 10 of these
servers for the purpose of auditing their performance.
The fact that the user does not record most of the servers where data has been dis-
tributed, there is no way to identify which of the participating eternity servers are storing
the data. Data queries are done via broadcast, and data delivery is achieved through
one-way anonymous remailers.
4.3 PAST
PAST [27] is a large scale P2P persistent storage management utility built on top of
the Pastry [28] lookup system. It is comprised of self-organizing, Internet based overlay
network of storage nodes which route file queries in a cooperative manner, perform replica
storage and caching. The PAST nodes forms an overlay network and are identified with
a 128-bit node identifier. All files have a fileId thats is a SHA-1 hash of the file name
and the public key of the node. To retrieve a file in PAST, the fileId is used, and in some
cases, the decryption key. In PAST, there are three main operations:
• Insert: insert a file to be stored in the network, replicated k times, where k is a
user specified number.
• Lookup: reliably retrieve a copy of the requested file identified by a file Id.
• Reclaim: reclaim the storage occupied by k copies of the file.
PAST uses smart cards that produce signed endorsement of a node’s request to consume
remote storage, the consumed space is charged to an internal counter. If a storage is
reclaimed the counter will be credited.
4.4 Scribe
Scribe [29] is a decentralized and scalable publish/subscribe system built on top of Pastry.
Users create topics to which other users can subscribe. Each Scribe group has a 160 bit
groupId, which serves as the address of the group. The nodes subscribed to each group
forms a multicast tree, consisting of the union of Pastry routes from all group members
4.5. Anonymity in P2P Storage Systems 15
to the node with nodeId numerically closest to the groupId. Users create and insert
messages into the system. The messages are encrypted before inserted. To send a message
to another user of the group, the notification service is used to provide the recipient with
the necessary information to locate and decrypt the message. The recipient may then
modify their personal metadata to incorporate the message into their view (e.g., into a
private mail folder).
4.5 Anonymity in P2P Storage Systems
Anonymity is a way of dissociating actions from identities. It is a key privacy technol-
ogy, since the value of private information is greatly diminished if it cannot be tied to a
particular identity. Privacy can sometimes be used to hide actions made, but is mostly
used to prevent observers from knowing the identity of a user. Moreover, there are some
kinds of private information that can only be protected with anonymity technologies, as
the actions themselves cannot be otherwise hidden, but the association with the actors
is sensitive information
Technological support for anonymity has been the subject of much research, starting
with Chaum’s seminal paper. Recently, much of the research has focused on P2P anony-
mous systems. P2P, in this context, means a dynamic, decentralized network comprising
of a large number of peers, each of whom both provides services for others in the network
and uses the services provided by others. The interest in P2P anonymous systems is
motivated by several factors.
The anonymity community has been long concerned about central points of failure, and
research into mix networks, starting with chaums original paper, has aimed to defend
against attacks on a particular server. The decentralized nature of P2P systems provides
a mechanism to distribute trust among a very large population. At the same time, P2P
systems are designed to scale very large numbers of users, in part by using scalable al-
gorithms for networks maintenance, and in part by exploiting the capacity scaling that
comes from users also providing services to others. Anonymity systems greatly benefit
from large user populations, since users can hide their actions among a larger crowd of
potential actors. Finally, there is an extra level of deniability that can achieved by con-
tributing to an anonymous system, rather than being just a user. Therefore, users have
an incentive to provide services to other users, mitigating the free rider problem of P2P
networks.
When considering a P2P publish system the need of dissociate actions from identities
can be a really important aspect, hence these kind of systems implements either or both
author/publisher anonymity and reader anonymity. This means that an the system pre-
vents an adversary from linking an author/publisher to an document, as well as prevent a
16 Peer-to-peer Storage Systems
document from being linked with its readers. In a P2P storage system, when using data
encryption, the same need of the above mentioned anonymities is really not needed.
CHAPTER 5
REDS
This chapter presents REDS - Redundant and Expandable Distributed file Storage sys-
tem for a serverless network, a large scale peer-to-peer persistent storage application.
REDS is based on a self-organizing network of peers.
While traditional network storage systems rely on a central server machine, a serverless
system utilizes computers cooperating as peers to provide different services. A serverless
system will also provide better performance and scalability than traditional server based
systems. Furthermore, the design of REDS provides high availability via redundant data
storage. To demonstrate the functionality of REDS, an implementation has been made
in Java. Furthermore, a Graphical User Interface (GUI) has been implemented.
5.1 Introduction
REDS is a data storage application designed to provide persistent access to the user’s
files within a P2P network. Inserted files are replicated on multiple nodes to ensure
persistence and availability. With high probability, the set of nodes over which a file is
replicated is diverse in terms of geographic location, ownership, network connectivity etc.
To ensure file content anonymity (ie. only the owner of a file can read the original con-
tent of a file), files are encrypted before distribution. Files are split into chunks and each
node have a list of the nodes storing its chunks. The users chooses how many replicas of
the file that should be distributed on the network, and hence how much disk space that
are required to be allocated from this user to store other users files. A main problem
in current P2P storage system is to prevent cheating nodes, who use disproportionately
more storage on other peers than they contribute to the network or falsify information
of either themselves or some innocent node. To overcome the problems with cheating
nodes, a innovative approach, consisting of a ranking system for the nodes, are being used.
17
18REDS - Redundant and Expandable Distributed file Storage system for
a serverless network
REDS builds on the Pastry [28] P2P routing scheme, ensuring that messages are reliably
routed to the appropriate nodes. Pastry provides fault-tolerance and scalability. A
Request to retrieve a file within the REDS network are routed to a node numerically
close to the node that issued the request, among all live nodes that store the file. In
[28], it is shown that the maximum expected number of REDS nodes traversed while
performing such a request, is dlog2bNe.
5.1.1 Project Ambitions and Challenges
In REDS, each node will play the role of both provider (each node provides disk space
to other nodes) and consumer (each node will be able to store data at other nodes).
A transaction between nodes can be a request to either store, retrieve, delete or verify
a file. After a transaction, a node get credits (in form of ranking points) for replying
to the request. The decentralized REDS system should satisfy the following requirements:
Cheating proof: the system should be resistant to abuse by ”cheating” nodes. A
decentralized storage system has no central authority that can control the nodes storage
records. In such a system it is reasonably to assume that some nodes tries to behave
unfair. This thesis considers two possible ways of cheating:
Freeloading - when a node, acting on its own, is trying to store more data in the network
than it stores locally for other nodes.
Node collaboration - when a group of nodes collaborates to falsify information of either
themselves or some innocent node.
Adaptable: the system should be efficient, scalable and able to handle the dynamic
nature of P2P systems, such as nodes joining and leaving the network. If a node crashes
and is unable to restore itself, it should be possible to set up a new node and retrieve its
old stored data. Furthermore, unlike other systems of this type file deletion should be
supported since typical nodes have finite local storage space.
Content Anonymous: the system should provide file content anonymity, meaning
that the original content of a file stored in the network should not be readable for other
users than the owner of the file.
Meeting these requirements poses several challenges in a decentralized environment. The
main challenges are ensuring that information about transactions is accurately recorded,
and making this information available to other nodes upon request. In decentralized
systems it is not obvious who should record this type of information, why other peers
should trust this entity and how nodes are storing the required information.
5.1. Introduction 19
5.1.2 Contribution
The primary contribution of this thesis is the design and implementation of REDS. This
thesis presents a novel approach to overcome the problems with cheating nodes who
attempt to falsify information in a P2P storage system. In comparison to previous work,
the design of REDS tolerates nodes which are occasionally offline. An analysis of the
REDS system has been performed, using freepastry’s built-in simulator. The simulations
provides a number of performance-critical characteristics, even for large number of nodes
in the network.
5.1.3 Related Work
A number of P2P storage systems have proceeded REDS. Some of the earliest systems
includes The Eternity Service and Freenet, designed to provide uncensorable storage.
Like REDS these systems use cryptography and redundancy to protect data, but unlike
REDS neither The Eternity Service [26] or Freenet [24] have a defense against cheating
nodes who tries to use more storage space than they provide. In Freenet , data which
is not being frequently accessed are deleted to make room for newer data. The Eter-
nity service uses redundancy and secret sharing to replicate data, and adds anonymity
to prevent denial of service attacks. Queries in the Eternity Service are broadcast, and
delivery is achieved through anonymous remailers.
More recent peer-to-peer storage systems include PAST [27] and OceanStore [25]. These
systems do not attempt to provide uncensorability, and are thus simpler than the pre-
vious systems. The PAST system is producing a global scale storage system using data
replication for durability. PAST use smartcards issued by trusted third parties for user
quota management so that users cannot use more remote storage than they are provid-
ing locally. This is not a great solution since the smartcards have to be re-issued after a
certain time, which will increase the cost for the user.
OceanStore is a federated system where utility companies pool their resources to pro-
vide storage to users. Each user contracts with a single company, the responsible party, to
receive storage for a fee. That company then exchanges storage with the other companies
for greater reliability and geographic range. OceanStore, because of its need to support
concurrent updates, is very complicated and requires a great deal of central resources.
Unlike REDS, both PAST and Oceanstore involves a cost for the users of these systems.
P2P file sharing systems, like Napster [3] and Gnutella [4], are in wide use and provides
a mechanism for file search and retrieval among a large set of peers. A centralized index
server is used in Napster to handle the searches, while broadcast queries are used in the
Gnutella system. These systems provide anonymity through encrypted search keys, data
caching, source-node spoofing and time-to-live values.
20REDS - Redundant and Expandable Distributed file Storage system for
a serverless network
5.2 REDS Design
Any host connected to the Internet can act as a node in REDS by installing the REDS
software. A REDS network is formed from a network of nodes, connected to the Internet,
storing each others data. Each node has a unique, randomly chosen 128-bit identifer. For
the purpose of routing, the node identifiers are thought as a sequence of digits with base
2b (b is a confguration parameter with typical value 4). When a node joins the system,
a node identity (nodeId) is randomly assigned to it. The identity of the node are then
used by Pastry to route messages between the nodes in the network. The node identity
indicates a node’s position in a circular namespace, which ranges from 0 to 2128− 1. The
identity of the node is randomly generated by computing a cryptographic hash code of
the node’s IP address and are assumed to be uniformly distributed in the circular names-
pace. This process leads, with high probability, to that there is no correlation between
the value of the node identity and the node’s geographic location, network connectivity,
or ownership.
Files are enrypted and split into specified-sized data chunks before inserted into the net-
work. Each data chunk has an associated chunk identity computed as a MD5 (Message-
Digest algorithm 5) [30] checksum of the chunk’s content. A chunk is then stored in
the network at the node whose node identity is numerically closest to the chunk iden-
tity. An index file will be created describing each file in terms of the ordered list of
chunks from which the files are composed. To retrieve a file in REDS, the user needs
to know the chunk identities, its decryption key and the node identity of the storing node.
In the design of REDS, each node will have a ranking list containing the ranking status
of all the nodes storing its data. In essence, this approach allows decentralization of a
previously centralized environment, where no central server or node is needed. REDS
will not provide facilities for searching or key distribution. REDS is, unlike many other
P2P systems, intended as an storage and content distribution utility and not as a file
sharing system.
5.2.1 Routing with Pastry
All distributed systems need a routing layer to get messages to their intended recipients.
Pastry is a P2P routing layer used by REDS. It is a scalable, decentralized and self-
organizing overlay network that automatically adapts to arrival, departure and failure of
nodes. A short overview of the Pastry design will be presented to provide the necessary
background for understanding REDS.
Each Pastry node has a unique, randomly chosen 128-bit identifier, called a node
identity. For the purpose of routing, the node identities are thought as a sequence of
digits with base 2b (b is a configuration parameter with typical value 4).
5.2. REDS Design 21
5.2.1.1 Node State
A Pastry node state consists of a routing table R, a leaf set L, and a neighborhood set M .
The routing table is organized into rows and the maximum number of rows are log2bN
(N is the number of nodes in the overlay). Each row can have 2b − 1 entries, where
each entry maps a node identity to the associated node’s IP address. Each entry in row
n refer to a node whose node identity matches the present node’s node identity in the
first n digits, followed by the column number and the rest of node identity. A routing
table entry is left empty if no node with the appropriate node identity prefix is known.
Table 1 shows an example of a routing table for a node with node identity 10233102, the
associated IP addresses are not shown in the table.
Routing Table
0 1 2 3
- - - -
- 11301233 12230203 13021022
10031203 10132102 - 10323302
10200230 10211302 10222302 -
10230322 10231000 10232121 -
10233001 - 10233232 -
- - 10233120 -
- - - -
Table 1: Routing Table for node with node identity 10233102. Format: matched digits(redcolored) column number(orange colored) rest of node identity)
Each node also maintains a leaf set ( see Table 2) of the l/2 numerically closest larger
node identities and the l/2 closest smaller node identities, relative to the present node (l
is a configuration parameter with typical value of 16 or 32). The leafset is used during
message routing, described later on. A neighborhood set is also maintained by the nodes,
listing a set of nodes which are closest, according to the proximity metric, to the present
node. The proximity metric is scalar value that represents the distance between a pair
of nodes, such as the round trip time. The neighborhood set is not normally used in
routing messages, but is useful in maintaining locality properties.
Leafset
Smaller Greater
10233033 10233021 10233120 10233122
10233001 10233000 10233230 10233232
Table 2: Leafset for node10233102
22REDS - Redundant and Expandable Distributed file Storage system for
a serverless network
5.2.1.2 Routing
Messages in Pastry are routed according to a longest prefix match principle on the desti-
nation’s Pastry key. If the key of a message is in range of the present node’s leaf set, then
the message is routed to the node whose node identity is closest to the key. If the key
is not covered by the node’s leafset, it looks up in the routing table a node whose node
identity shares a longer prefix with the key than its own node identity and routes the
message to this node. If there is no such node the message is routed to a node that shares
the same length prefix with the present node, but is numerically closer to the destination
address. This process Assuming a REDS network consisting of N nodes, Pastry can route
to the numerically closest node to a given chunk identity in less than dlog2bNe steps on
average [28]. With concurrent node failures, eventual delivery is guaranteed unless bl/2cnodes with adjacent node identities fail simultaneously (l is a configuration parameter
with typical value 16).
Figure 5.1 shows the steps that a query takes, while being routed through the Pastry
routing substrate. In the example below, node with identity 65A1FC is trying to locate
data using the key D46A1C.
D13D
A3
D4213F
D462BA
D46
7C4
D471F
1
65A1FC
route(D46A1C)
D46
A1C
0 2 -1128
Figure 5.1: Routing in Pastry
5.2. REDS Design 23
5.2.1.3 Node Arrival and Departure
When a new node joins the Pastry network, it has to initialize a routing table, a leaf
set and a neighborhood set. After the initialization the node inform other nodes of its
presence. If a node with identity X wants to join the Pastry network, it initially locates
a node A in the network. Node A can be located by performing an expanding ring search
using a multicast mechanism [31]. Node X sends a ”join message” with its node identity
X as the key to node A. Node A routes the message to node Z, which is the numerically
closest node to X. Each node along the path to node Z sends a row from its routing
table to node X. The i:th node sends its i:th row. The Neigbhorhood set is taken from
node A because it is very likely that this is a close-by node X. The leaf set is taken from
Z, because it is numerically the nearest node to X. Finally, node A informs any node
that needs to be aware of its arrival.
As nodes may fail or depart without warning, nodes in the leafset periodically exchanges
keep-alive messages. A Pastry node is considered failed when its immediate neighbors in
the node identity space can no longer communicate with the node. If a node fails, all the
members of the failed node’s leafset update their leaf sets. Stale entries in the routing
table are repaired by first trying to find a new route via its downstream node, and if not
successful, starting its route table management mechanism.
The main functions in the Pastry API are route and deliver (Table 3 and 4). Route is
used to send a message to a node which is responsible for a given key, and deliver is a
call-back invoked at the target node when the message has arrived. The method forward
(Table 5) is also a call-back invoked on each node on the routing path.
24REDS - Redundant and Expandable Distributed file Storage system for
a serverless network
VOID route(KEY key, MESSAGE msg, NODEHANDLE hint)
Routes the message msg to the node which is responsible for key. The call is asyn-
chronous, no acknowledgment is sent, and no quality of service is guaranteed. The
optional parameter hint may contain the address of a node which will be used as
first routing hop. If the hint is good (e. g. it contains the target node’s address),
the message can be delivered with only one hop. In the contrary, a bad hint may
introduce a further unnecessary routing step.Table.3
VOID deliver(KEY key, MESSAGE msg)
This method is a call-back which has to be provided by the application. The p2p
routing mechanism calls this method upon arrival of a new message for this node.Table.4
VOID forward(KEY key, MESSAGE msg, NODEHANDLE nextHopN-
ode)
This is a call-back method which is called at each node on the routing path of
message msg, including the source node and the target node. It is called just before
the message is forwarded to the next hop, which has already been determined as
nextHopNode. Within the method, each parameter may be modified. When key or
nextHopNode are modified, the routing behavior is altered.Table.5
5.2.2 Operation Logic
At present REDS supports a very basic set of operations.
• Insert: Stores a file with the user-specified level of replication, k, determining how
many copies of the file should be stored in the network. The file is encrypted and
split into fixed-sized data chunks. Each data chunk has an associated chunkID. A
chunk is stored in the network at the node whos node identity is numerically closest
to the chunkID.
• Retrieve: The operation retrieves a copy of the specified file by retrieving all
chunks belonging to the file identified by the chunkID’s, if they exists in REDS
and if they are available in the network. The data chunks is then put together and
decrypted with the node’s decryption key.
• Delete: This operation reclaims the storage occupied by the k replicas of the file.
It also sends a delete request to the storing node to inform the node that it can
delete the file.
5.2. REDS Design 25
5.2.3 Ranking System
In a P2P storage system, the ability to consume storage space can be seen as some kind
of currency. In such a system, it is just feasible that remote storage is more valuable to
a node than its local storage. When a node exchanges its local storage against another
nodes remote storage both parties will benefit off the trade, giving them an incentive to
cooperate. However, one of the main problems in a P2P storage system is to guarantee
fairness, to ensure that a node will only use as much storage space as it provides to the
system. In a P2P storage system, a decentralized approach to this problem is needed,
which ensures that all nodes are equal and no peer has higher authority than other nodes.
A P2P storage system needs to ensure its users that their data has not been modified in
the time between the initial store and later retrieval. REDS generates a MD5 checksum
of the data before distribution. Upon retrieval of the data, the checksum is compared to
the checksum of the retrieved data. Besides ensuring file integrity at the time of retrieval,
there must exist some way for the owner of a data chunk to confirm that the storage
node continues to store an accurate copy of the original uploaded data.
One of the first quota approaches in P2P storage systems has been mentioned in PAST,
suggesting the use of smart cards that produce signed endorsement of a node’s request
to consume remote storage. The consumed space is charged to an internal counter. If a
storage is reclaimed the counter will be credited. But the approach with smart card has
disadvantages. A trusted organization that issues the cards are needed. After a period
of time the smart cards will have to be re-issued to invalidate compromised cards, with
the result of increased cost for the users.
In REDS, a different approach is used to ensure fairness in the system. Unlike the
smart cards design, nodes are required to maintain and send, upon request, their own
usage records such that other nodes can take part of it. This approach has to consider the
fact that nodes have no natural incentive to tell the truth about their records. Because
of that, this approach has to have disincentives to nodes lying on their records.
Every node in REDS will maintain a usage file, available for other nodes to verify. The
usage file contains information about:
• The amount of disk space the node is providing to the system.
• Local storage list, a list consisting of (nodeId, chunkId) pairs, containing the iden-
tifiers and sizes of all files that the node is storing on its local disk on behalf of
others.
• Remote storage list, a list consisting of (chunkId, nodeId) pairs, containing the
identifiers and sizes of all files that the node is storing in the system, on behalf of
it self.
26REDS - Redundant and Expandable Distributed file Storage system for
a serverless network
The two lists together describes all credits and debits to the node’s quota. The aggre-
gate file size, which includes the original file size and the sum of all its replicas, is debited
against the node’s quota when a file is saved into the system. The delete operation adds
the aggregate file size to the node’s quota. The number of replicas may be changed
autonomously, by each client node, according to the file availability considerations and
the available network storage. Every node in REDS will also maintain a ranking file,
containing all node identities from the remote storage list and ranking status of these
nodes.
There are two possible ways for a node to cheat on others. The first one is to inflate
its advertised capacity beyond the resources of the disk, this might attract storage re-
quest that the node cannot satisfy. The node may try to compensate this by creating
fraudulent entries in its local storage list, to claim the storage is being used. The second
possibility is to deflate the remote storage list, by deleting entries without informing the
storage node that it should delete the file.
To prevent nodes from cheating, a storage list control will be performed. A node detects
for any file in its local storage list, if there is an entry in the appropriate node’s remote
storage list. If the entry is missing, a negative ranking point is debited to the node’s
entry in the ranking file. Furthermore, a message with the negative ranking point is sent
to the nodes in the cheating node’s remote storage list to inform these nodes to update
their ranking status for the cheating node. Also, a node randomly performs different
requests to the nodes in its ranking file. Whenever the response is positve the node’s
ranking status is increased. The three types of requests are:
• Alive request: A randomly performed alive request to control if the node is alive.
A node’s reply to the alive request will increase its ranking points.
• Checksum request: To ensure a stored file’s existence in the network, a randomly
generated checksum request, will periodically be sent to the node holding the file.
The checksum reply will then be controlled to be correct at the node who owns the
file. The node’s ranking points will increase if the checksum is correct and decrease
otherwise.
• Data control request: A second control to ensure the stored file’s existence will be
made, less often than the checksum request, where a copy of the actual file are
requested from the storing peer. When the file is being sent to its owner, as a reply
to the file request, it will be checksummed to ensure that the file is the correct file
and has not been modified from its original format.
5.2. REDS Design 27
5.2.4 Protocols
In order to facilitate the logical operations described in section 5.2.2, the following pro-
tocols have been defined. The file storage protocol is used for the insert operation, when
a node wants to store a file into the network. To perform a data control of a node’s
stored files or in case of file retrieval, the storage control protocol is used. The file delete
protocol is used during the delete operation.
The following notations are used to explain the protocols used in REDS.
Symbol Description
S(A) A’s storage request
Sr(B) B’s storage reply
reply(B) B’s reply, either Yes or No
leafset(B) A List of nodes in B’s leafsetList
localStorageList(B) A list of nodes B stores data for.
remoteStorageList(B) A list of nodes storing B’s data.
F(A) A’s file control request
Fr(B) B’s file control reply
CH(A) A’s checksum control request
CHr(B) B’s checksum control reply
D(A) A’s delete request
F File F
name(F) File name of F
Size(F) size of file F
ch(F) checksum of file F
Info(M) Message M
IDA ID of node A
t timestamp
28REDS - Redundant and Expandable Distributed file Storage system for
a serverless network
File Storage Protocol Node A contacts node B with a request to store file F. Algo-
rithm 1 details the procedure.
Algorithm 1 File Storage Protocol1. Storage RequestA→ B : S(A) = Info(storage, IDA, IDB, Size(F ))A sends a storage request to B, which contains identities of both nodes and file size.
2. Storage replySr(B)→ A = Info(storagereply, reply(B), localStorageList(B),remoteStorageList(B), leafSetList(B))B sends a storage reply to A containing the reply (Yes or No), its localStorageListcontaining a list of nodes B is storing data for, B’s remoteStorageList, and its leaf-SetList containing node identity’s that A can send storage requests to instead ofB.
2.a
In case of storage reply No, A will send a request to the nodes from B’s localStor-ageList, asking these nodes if they are storing data at node B. In case of a No, Awill send out a ranking message saying that node B is lying, to nodes holding B’sdata and to those nodes storing data at B. If node B’s localStorageList is confirmed,A will send a storage request to the nodes in B’s leafset.
2.b
If the storage reply from B is Yes, a file transfer between A and B is performed.
3. File Transfer A→ B : F
A sends file F to B for storage.A adds B to its remote storage list.B adds A to its local storage list.
5.2. REDS Design 29
Storage Control Protocol
The Storage control protocol is containing two parts, the checksum control request and
the file control request. Both these request are to control a nodes files stored in the
network and they are shown in Algorithm 2.
Algorithm 2 Storage Control Protocol1A. Checksum Control RequestA→ B : CH(A) = Info(checksum, name(F ))A sends a checksum control request to B, containing the filename.
1B. Checksum Control ReplyB → A : CHr(B) = Info(checksumReply, name(F ), ch(F ), t)B sends a checksum control reply containing the filename, the checksum and atimestamp for the checksum
1C. Rankinglist UpdateIf A receives a checksum control reply from B, A will control if the checksum iscorrect and then update its ranking list, either with a positive credit for a correctchecksum or a negative credit for a false checksum reply. If A does not receive areply from B, A will update the ranking list, but give B negative credit for notreplying.
2A. File Control RequestA→ B : F (A) = Info(fileControl, name(F ))A sends a file control request containing the filename of the requested file.
2B. File Control ReplyB → A : Fr(B) = Info(fileControlReply, name(F ), F )B sends a file control reply to A containing file F.
2C. Rankinglist UpdateIf A receives a file control reply from B, A will make a checksum control of thereceived file and then update its ranking list, either with a positive credit for acorrect checksum or a negative credit for a false checksum reply. If A does notreceive a reply from B, A will update the ranking list, but give B negative credit fornot replying.
30REDS - Redundant and Expandable Distributed file Storage system for
a serverless network
File Delete Protocol The protocol for deleting files in REDS is described in Algo-
rithm 3.
Algorithm 3 File Delete Protocol
1. Storage reclaim
Node A reclaims the storage occupied in the network by the file F and all the
replicas of the file. The aggregate storage adds to A’s usage quota.
2. File Deletion Information
A→ B : D(A) = Info(fileDelete, name(F ), size(F ))
A sends a file delete request to B containing the file name and the size of the
file, informing B that the file can be deleted. B deletes the file and delete this
file’s entry from its local storage list, then his usage quota will be updated.
5.3 Implementation
This section described important aspects of the implementation. REDS has been im-
plemented in Java on top of the FreePastry implementation of the Pastry routing layer.
There are several reasons why Java was used as a programming language. One of the
main reasons was the speed of development. Unlike C and C++, Java is strongly typed
and uses garbage collection. These two features greatly reduce debugging time, especially
for a large project with a rapid development pace. Another reason for choosing Java was
to make the REDS platform independent.
Each node in REDS has a request system and a ranking system. The request system
consists of two modules: the consumer module that generates storage, verification, rank-
ing and deletion requests; the provider module that handles storage, verification, ranking
and deletion requests issued by other nodes. Each module is implemented as a thread
with a message queue for incoming messages. The provider module is event driven and
are activated when a message is put in the queue. The Pastry routing layer is used to
route messages between the nodes in the REDS network.
5.3.1 Security
The very nature of P2P indicates that nodes are organized in a flat structure where
no node is more important than another. In other words, users are in control of their
content. REDS security model is based on the assumption that it is computationally
infeasible to break the symmetric key cryptosystem and the cryptographic hash function
used in REDS.
5.3. Implementation 31
5.3.2 Initiation
To join REDS a node identity is required, therefore such a node identity is automatically
generated at the first startup of REDS. REDS will then join the new node to the network
by using Pastry’s built-in mechanism.
The Advanced Encryption Standard (AES) [32] is used to create a 128 bit encryp-
tion/decryption key. AES is a symmetric-key block cipher, which means that it use the
same key to encrypt and decrypt data. The generated key is then used by REDS to
encrypt all data before distribution, this to ensure file content anonymity (ie. only the
owner of a file can read the original content of a file).
5.3.3 File Storage
When the initiation process is completed, the node can start storing its data into the
network. After the files to store has been selected, they will be encrypted with the
encryption key generated in the initiation step. After encryption, the encrypted files are
split into chunks of a specified size and then distributed into the network. The node’s
usage record and ranking system are updated with necessary information. In this phase it
is also recommended to use REDS built-in restore file generator, which is a file containing
all the necessary information needed for a node restore. The restore file needs to be stored
somewhere safe outside the node.
5.3.4 Data Verification
Due to the fact that nodes have no natural incentive to behave fair, REDS has a verifica-
tion system (described in section 5.2.3). Verification requests are randomly sent to nodes
in the remote storage list. Based on the replies or lack of replies, the ranking status for
the nodes will be updated. The nodes ranking status are shown in the graphical user
interface (Figure 5.2 ) by one of the three indicators: OK, WATCH OUT or BAD. If the
indicator is BAD for a certain node, it will be deleted from the remote/local storage list
and a new node will take its place. The GUI also indicates the status of the files. A file
has the status OK if at least one replica of the file is stored at nodes with status OK,
otherwise it will be in status NOT OK.
32REDS - Redundant and Expandable Distributed file Storage system for
a serverless network
5.3.5 Node Restorage
In case of a node crash and a new node is being set up, it is possible to restore the
crashed node’s old data by using the restore file. REDS will at first see if the old node
identity is available, if so, assign it to the new node. If it is occupied, a new node identity
is generated. Necessary update information are routed to the nodes who needs to know
about the node restore and the remote/local storage list are updated.
FILE1_CH1_R3
FILE1_CH1_R4
+ FILE1_CH2
MESSAGESSTORAGEAllocated Local space: 2000 MB Local storage: 500 MB
Remote storage: 1000 MB
Storage to use: 1000 MB
2008-09-12 FILE1_CH1_R2, storage node will be replaced2008-09-12 FILE1_CH1_R2, unsuccessful checksum control2008-09-12 FILE1_CH1_R3, storage node will be replaced2008-09-12 FILE1_CH1_R3, unsuccessful checksum control2008-09-12 FILE2, successful data verification
FILE1_CH1_R1
FILE1_CH1_R2
BAD
WATCH OUT
FILE FILE STATUS NODE STATUS
OKOK
OK
OKBAD
RETRIEVE
DELETE
INSERT
+ FILE2
- FILE1
- FILE1_CH1
Figure 5.2: Graphical user interface
5.3. Implementation 33
5.3.6 Simulations
This section described the two simulations that has been performed. The Simulations are
analyzing the file availability threshold and bandwidth overhead in the REDS system.
The file availability simulation shows the number of replicas needed to achieve an desir-
able availability of the stored files in REDS. The overhead simulation shows the overhead
of the verification and ranking system. If the overhead becomes very high, the REDS
system will not be effiecient.
Figure 5.3: File availability
Figure 5.4: Bandwidth overhead with ranking system
Figure 5.3 present a simula-
tion of 10000 nodes with un-
limited storage space. The
simulation computes the mini-
mum number of replicas neces-
sary to achieve a certain avail-
ability threshold in the pres-
ence of node failures. The goal
is to use the minimum number
of replicas of a file to provide
a desirable level of availability.
Nodes join and leave the net-
work with a specified probabil-
ity and the assumption is made
that a failing or leaving node
loose all the replicas it stores.
The system needs at least one
replica to be available, the file
availability is therefore defined
as the probability that one or
more nodes are up. During the
simulation: a certain number
of nodes goes down, a percent-
age of the nodes that are up
check for replicas of their files.
A replica location accuracy of
0.8 and the probability that 0.5
of the nodes are being up, is assumed for the simulation. The result of the file availability
simulation shows that only 4 replicas is needed to achieve an availability of 0.8.
A second simulation concerning the bandwidth overhead of data verification and rank-
ing system is presented in Figure 5.4 The storage space of each node is randomly chosen
from 1 Gigabyte up to 100 Gigabyte. In each day of the simulated time each node per-
34REDS - Redundant and Expandable Distributed file Storage system for
a serverless network
form: one checksum request and one data control request to random nodes in its remote
storage list. Furthermore, each node also verifies its local storage list once a day. The
simulation is done with 100, 1000 and 10000 nodes with 300 files per node, simulated
over 7 days. The result of the simulation show a quite low overhead, even for a large set
of nodes, which indicates that REDS scales well.
CHAPTER 6
Conclusions and Future Work
This thesis has introduced anonymous networks as well as described some well-known
anonymous systems. The P2P concept has been described and more detailed description
of Napster among others has been presented. Some proposed distributed P2P storage
systems was identified, as well as their lack of defense against cheating nodes.
This thesis addressed this deficiency and proposed a new P2P file storage system,
”REDS - Redundant and Expandable Distributed file Storage system for a serverless net-
work”, a large-scale and fully decentralized file storage system with the aim to support a
large number of collaborating nodes. An innovative ranking system has been presented,
a novel approach to overcome the problem with cheating nodes who tries to falsify infor-
mation in a P2P storage system. Using this ranking system, REDS is designed to detect
and suspend malicious nodes within the network. The simulation results show that the
ranking system has low bandwidth overhead. REDS has been implemented on top of
FreePastry, a java implementation of Pastry, a P2P object location and routing sub-
strate overlayed on the Internet. REDS leverages the scalability, locality, fault-resilience
and self-organization properties of Pastry. The simulation results, based on freepastry’s
own simulator, indicate that REDS scales well and is able to efficiently support a large
number of nodes. However, the system still has to prove its usability in practice. REDS
has been described, characterized and a prototype has been implemented. While many
important challenges remain, this prototype is a working subset of the vision presented
in this thesis. An important future direction of this thesis would be to investigate the
optimization of the proposed ranking system used by REDS. The ideal goal would be to
present a fully optimized ranking system, such that cheating nodes always gets detected
and suspended by REDS.
35
REFERENCES
[1] “IBM homepage,” August 2008. http://www-03.ibm.com.
[2] “RAID,” August 2008. http://en.wikipedia.org/wiki/RAID.
[3] “Napster inc.,” August 2008. http://www.napster.com.
[4] “Gnutella,” August 2008. http://en.wikipedia.org/wiki/Gnutella.
[5] J. A. Pouwelse, P. Garbacki, D. H. J. Epema, and H. J. Sips, “The bittorrent p2p
file-sharing system: Measurements and analysis,” in IPTPS, pp. 205–216, 2005.
[6] D. Chaum, “Untraceable electronic mail, return addresses, and digital pseudonyms,”
Communications of the ACM, vol. 24, pp. 84–90, February 1981.
[7] P. F. Syverson, D. M. Goldschlag, and M. G. Reed, “Anonymous connections and
onion routing,” in SP 1997: Proceedings of the 1997 IEEE Symposium on Security
and Privacy, (Washington, DC, USA), p. 44, IEEE Computer Society, 1997.
[8] R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second-generation onion
router,” in In Proceedings of the 13th USENIX Security Symposium, pp. 303–320,
2004.
[9] T. Dierks and E. Rescorla, “The transport layer security (TLS) protocol version
1.1,” RFC 4346, Internet Engineering Task Force, April 2006.
[10] K. Bauer, D. McCoy, D. Grunwald, T. Kohno, and D. Sicker., “Low-resource routing
attacks against anonymous systems,” tech. rep., University of Colorado at Boulder,
Boulder, CO, USA, February 2007.
[11] S. J. Murdoch and G. Danezis, “Low-cost traffic analysis of tor,” in SP ’05: Pro-
ceedings of the 2005 IEEE Symposium on Security and Privacy, (Washington, DC,
USA), pp. 183–195, IEEE Computer Society, 2005.
37
38
[12] I. Goldberg, “On the security of the tor authentication protocol,” in Proceedings of
the Sixth Workshop on Privacy Enhancing Technologies (PET 2006), (Cambridge,
UK), pp. 316–331, Springer, June 2006.
[13] L. Zhuang, F. Zhou, B. Y. Zhao, and A. Rowstron, “Cashmere: Resilient anonymous
routing,” in Proc. of NSDI, (Boston, MA), ACM/USENIX, May 2005.
[14] M. K. Reiter and A. D. Rubin, “Anonymous web transactions with crowds,” Com-
munications of the ACM, vol. 42, no. 2, pp. 32–48, 1999.
[15] O. Berthold, A. Pfitzmann, and R. Standtke, “The disadvantages of free mix routes
and how to overcome them,” in International workshop on Designing privacy en-
hancing technologies, (New York, NY, USA), pp. 30–45, Springer-Verlag New York,
Inc., 2001.
[16] C. Diaz, S. Seys, J. Claesson, and B. Preneel, “Towards measuring anonymity,”
In Procedeeings of the Privacy Enhancing Technologies Workshop (PET 2002),
Springer, vol. 2482, pp. 54–68, April 2002.
[17] L. Overlier and P. Syverson, “Locating hidden servers,” in SP ’06: Proceedings
of the 2006 IEEE Symposium on Security and Privacy, (Washington, DC, USA),
pp. 100–114, IEEE Computer Society, 2006.
[18] M. Wright, M. Adler, B. N. Levine, and C. Shields, “An analysis of the degradation
of anonymous protocols,” in Proceedings of the Network and Distributed Security
Symposium - NDSS ’02. IEEE, February 2002.
[19] J. R. Douceur, “The sybil attack,” in IPTPS, pp. 251–260, 2002.
[20] D. Agrawal and C. C. Aggarwal, “On the design and quantification of privacy pre-
serving data mining algorithms,” in PODS ’01: Proceedings of the twentieth ACM
SIGMOD-SIGACT-SIGART symposium on Principles of database systems, (New
York, NY, USA), pp. 247–255, ACM, 2001.
[21] M. Roussopoulos, M. Baker, D. S. H. Rosenthal, T. J. Giuli, P. Maniatis, and J. C.
Mogul, “2 p2p or not 2 p2p?,” in IPTPS, pp. 33–43, 2004.
[22] M. Rennhard and B. Plattner, “Introducing morphmix: peer-to-peer based anony-
mous internet usage with collusion detection,” in WPES ’02: Proceedings of the
2002 ACM workshop on Privacy in the Electronic Society, (New York, NY, USA),
pp. 91–102, ACM, 2002.
[23] M. J. Freedman and R. Morris, “Tarzan: A peer-to-peer anonymizing network layer,”
in In Proceedings of the 9th ACM Conference on Computer and Communications
Security (CCS 2002, 2002.
39
[24] I. Clarke, O. Sandberg, B. Wiley, and T. Hong, “Freenet: A distributed anonymous
information storage and retrieval system,” In Procedeeings of Designing Privacy
Enhancing Technologies: Workshop on Design Issues in Anonymity and Unobserv-
ability, pp. 46–66, July 2000.
[25] J. Kubiatowicz, D. Bindel, Y. Chen, S. E. Czerwinski, P. R. Eaton, D. Geels,
R. Gummadi, S. C. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Y.
Zhao, “Oceanstore: An architecture for global-scale persistent storage,” in ASP-
LOS, pp. 190–201, 2000.
[26] R. J. Anderson, “The eternity service,” in In Proceedings of Pragocrypt, pp. 242–252,
1996.
[27] P. Druschel and A. Rowstron, “Past: A large-scale, persistent peer-to-peer stor-
age utility,” in HOTOS ’01: Proceedings of the Eighth Workshop on Hot Topics in
Operating Systems, (Washington, DC, USA), p. 75, IEEE Computer Society, 2001.
[28] A. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object location and
routing for large-scale peer-to-peer systems,” In Proceedings of IFIP/ACM Interna-
tional Conference on Distributed Systems Platforms, pp. 329–350, November 2001.
[29] A. I. T. Rowstron, A.-M. Kermarrec, M. Castro, and P. Druschel, “Scribe: The
design of a large-scale event notification infrastructure,” in NGC ’01: Proceedings of
the Third International COST264 Workshop on Networked Group Communication,
(London, UK), pp. 30–43, Springer-Verlag, 2001.
[30] “MD5,” August 2008. http://en.wikipedia.org/wiki/MD5.
[31] S. Floyd, V. Jacobson, C.-G. Liu, S. McCanne, and L. Zhang, “A reliable multi-
cast framework for light-weight sessions and application level framing,” IEEE/ACM
Trans. Netw., vol. 5, no. 6, pp. 784–803, 1997.
[32] J. Daemen and V. Rijmen, The Design of Rijndael. Secaucus, NJ, USA: Springer-
Verlag New York, Inc., 2002.