11
Peer-assisted content distribution on a budget Pietro Michiardi a , Damiano Carra b,, Francesco Albanese a , Azer Bestavros c a Networking and Security Department, Eurecom, 2229 Route des Cretes, BP 193, F-06560 Sophia-Antipolis, France b Computer Science Department, University of Verona, Strada leGrazie 15, 37134 Verona, Italy c Computer Science Department, Boston University, 111 Cummington Street, Boston, MA 02215, USA article info Article history: Received 18 August 2011 Received in revised form 13 February 2012 Accepted 17 February 2012 Available online 28 February 2012 Keywords: Internet content distribution Peer-assisted content distribution Peer-to-peer abstract In this paper, we propose a general framework and present a prototype implementation of peer-assisted content delivery application. Our system – called CYCLOPS – dynamically adjusts the bandwidth consumed by content servers (which represents the bulk of content delivery costs) to feed a set of swarming clients, based on a feedback signal that gauges the real-time health of the swarm. Our extensive evaluation of CYCLOPS in a variety of settings – including controlled PlanetLab and live Internet experiments involving thousands of users – shows a significant reduction in content distribution costs when compared to exist- ing swarming solutions, with a minor impact on the content delivery times. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction Traditional Content Delivery Networks (CDNs) such as Akamai [1] were conceived as special-purpose services cater- ing almost exclusively to large, highly-popular content pro- viders such as iTunes and CNN. Today, however, the advent of cheap Internet storage and delivery services – for example Amazon S3 [2] and CloudFront [2] – make it possible for much smaller-scale content providers to deploy and provi- sion their own ‘‘ad hoc’’ CDNs in an almost real-time fashion. In general, the major cost contributor for CDNs is the bandwidth consumed to deliver content from content servers to the clients. Essentially, the bandwidth costs associated to content distribution are reflected into ‘‘con- tent delivery plans’’ that customers of a CDN service are re- quired to pay. With the goal of bounding such costs – and thus attract a variety of customers, including small content producers with budget constraints – it is increasingly the case that content delivery solutions are evolving from sim- ple client-CDN interactions (reminiscent of the traditional client-server model) into swarm-CDN interactions, wherein the content servers are not merely responding to individ- ual client requests, but rather to the collective demand of a set of clients that are looking for the same content, usu- ally referred to as ‘‘swarm’’. Indeed, to reduce bandwidth requirements, an increasing number of CDN solutions (including those offered by major market players such as Akamai [1], Limelight [3], and Amazon [2]) rely on swarm-based, peer-assisted approaches that leverage the uplink capacity of end-users to reduce the CDN bandwidth consumption. This approach, which is particularly effective for highly-popular content, can be seen as seamlessly bridging client-CDN and swarm-CDN interactions: For less-popular content, a peer-assisted CDN behaves as a tra- ditional CDN system, whereas for highly-popular content, it behaves as a peer-to-peer system. Existing cloud-based peer-assisted CDNs rely on swarm- based protocols such as BitTorrent [4]. While such swarm- based protocols are quite efficient for exchanging content among peers (in terms of download time, resource utiliza- tion, and fairness), they are not designed to provide the content source with the means to gauge the marginal utility of its contribution to the swarm. Specifically, in a peer-as- sisted CDN setting, swarm-based protocols do not enable the content server to gauge and hence manage the inherent tradeoffs between server bandwidth utilization and the 1389-1286/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2012.02.011 Corresponding author. E-mail addresses: [email protected] (P. Michiardi), damia [email protected] (D. Carra), [email protected] (F. Albanese), [email protected] (A. Bestavros). Computer Networks 56 (2012) 2038–2048 Contents lists available at SciVerse ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet

Peer-assisted content distribution on a budget

Embed Size (px)

Citation preview

Page 1: Peer-assisted content distribution on a budget

Computer Networks 56 (2012) 2038–2048

Contents lists available at SciVerse ScienceDirect

Computer Networks

journal homepage: www.elsevier .com/locate /comnet

Peer-assisted content distribution on a budget

Pietro Michiardi a, Damiano Carra b,⇑, Francesco Albanese a, Azer Bestavros c

a Networking and Security Department, Eurecom, 2229 Route des Cretes, BP 193, F-06560 Sophia-Antipolis, Franceb Computer Science Department, University of Verona, Strada leGrazie 15, 37134 Verona, Italyc Computer Science Department, Boston University, 111 Cummington Street, Boston, MA 02215, USA

a r t i c l e i n f o

Article history:Received 18 August 2011Received in revised form 13 February 2012Accepted 17 February 2012Available online 28 February 2012

Keywords:Internet content distributionPeer-assisted content distributionPeer-to-peer

1389-1286/$ - see front matter � 2012 Elsevier B.Vdoi:10.1016/j.comnet.2012.02.011

⇑ Corresponding author.E-mail addresses: [email protected] (

[email protected] (D. Carra), francesco.albanese@[email protected] (A. Bestavros).

a b s t r a c t

In this paper, we propose a general framework and present a prototype implementation ofpeer-assisted content delivery application. Our system – called CYCLOPS – dynamicallyadjusts the bandwidth consumed by content servers (which represents the bulk of contentdelivery costs) to feed a set of swarming clients, based on a feedback signal that gauges thereal-time health of the swarm. Our extensive evaluation of CYCLOPS in a variety of settings –including controlled PlanetLab and live Internet experiments involving thousands ofusers – shows a significant reduction in content distribution costs when compared to exist-ing swarming solutions, with a minor impact on the content delivery times.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

Traditional Content Delivery Networks (CDNs) such asAkamai [1] were conceived as special-purpose services cater-ing almost exclusively to large, highly-popular content pro-viders such as iTunes and CNN. Today, however, the adventof cheap Internet storage and delivery services – for exampleAmazon S3 [2] and CloudFront [2] – make it possible formuch smaller-scale content providers to deploy and provi-sion their own ‘‘ad hoc’’ CDNs in an almost real-time fashion.

In general, the major cost contributor for CDNs is thebandwidth consumed to deliver content from contentservers to the clients. Essentially, the bandwidth costsassociated to content distribution are reflected into ‘‘con-tent delivery plans’’ that customers of a CDN service are re-quired to pay. With the goal of bounding such costs – andthus attract a variety of customers, including small contentproducers with budget constraints – it is increasingly thecase that content delivery solutions are evolving from sim-ple client-CDN interactions (reminiscent of the traditionalclient-server model) into swarm-CDN interactions, wherein

. All rights reserved.

P. Michiardi), damiaecom.fr (F. Albanese),

the content servers are not merely responding to individ-ual client requests, but rather to the collective demand ofa set of clients that are looking for the same content, usu-ally referred to as ‘‘swarm’’. Indeed, to reduce bandwidthrequirements, an increasing number of CDN solutions(including those offered by major market players such asAkamai [1], Limelight [3], and Amazon [2]) rely onswarm-based, peer-assisted approaches that leverage theuplink capacity of end-users to reduce the CDN bandwidthconsumption. This approach, which is particularly effectivefor highly-popular content, can be seen as seamlesslybridging client-CDN and swarm-CDN interactions: Forless-popular content, a peer-assisted CDN behaves as a tra-ditional CDN system, whereas for highly-popular content,it behaves as a peer-to-peer system.

Existing cloud-based peer-assisted CDNs rely on swarm-based protocols such as BitTorrent [4]. While such swarm-based protocols are quite efficient for exchanging contentamong peers (in terms of download time, resource utiliza-tion, and fairness), they are not designed to provide thecontent source with the means to gauge the marginal utilityof its contribution to the swarm. Specifically, in a peer-as-sisted CDN setting, swarm-based protocols do not enablethe content server to gauge and hence manage the inherenttradeoffs between server bandwidth utilization and the

Page 2: Peer-assisted content distribution on a budget

P. Michiardi et al. / Computer Networks 56 (2012) 2038–2048 2039

efficacy of content delivery. This is precisely the capabilitythat the work presented in this paper aims to provide.

101

102

103

104

ownl

oad

Tim

e [s

]

λ = 1.5 μλ = 3 μλ = 6 μ

1.1. Paper scope and contributions

In this paper, we propose a framework for peer-assistedCDN solutions in which the content server is able to adjustthe bandwidth it contributes to the swarm (the set of cli-ents downloading content) so as to achieve a specific objec-tive based on a feedback signal related to the state of theswarm. Our framework is general enough to allow formany possible combinations of objectives and feedbacksignals. For instance, the objective may simply be to keepthe swarm alive based on a feedback signal indicating thelevel of redundancy for particular pieces of content in theswarm. Alternately, the objective may be to ensure a desir-able level of service based on a feedback signal gaugingaverage delivery time to clients.

To establish a reference model for these as well as othercombinations of objectives and feedback signals, in Section2, we discuss the cost-performance tradeoff for peer-as-sisted content delivery. Our findings suggest the existenceof a quiescent (close to optimal) operating point beyondwhich the marginal utility from additional content serverbandwidth utilization is negligible.

Based on this understanding, in Section 3, we presentthe design and prototype implementation of CYCLOPS, apeer-assisted content delivery service. The content serverin CYCLOPS is able to modulate its bandwidth contributionto the swarm so as to remain in the vicinity of the afore-mentioned quiescent operating point – thus minimizingits cost without sacrificing performance. Our design relieson the feedback signal provided through an on-line moni-toring tool, which we have implemented as part of CYCLOPS.

To demonstrate the effectiveness of our approach, inSections 4 and 5 we report on a fairly extensive series ofInternet experiments, in which we compare the perfor-mance of CYCLOPS to those of ‘‘open-loop’’ swarm-based pro-tocols used by cloud-based content delivery services. Ourexperiments are carried out both in a controlled environ-ment (by delivering content to PlanetLab clients) and inthe wild (by delivering content to a real Internet user pop-ulation). These experiments show that our feedback-basedapproach reduces drastically the volume of data servedfrom the cloud (and hence the cost incurred by the contentprovider) with negligible performance degradation.

For illustrative purposes, our experimental setting in-volves an instance of CYCLOPS deployed as a ‘‘cloud-based’’service. In particular, we used Amazon EC2 instances tocreate a possible deployment scenario and measure theperformance and costs of content delivery. In live experi-ments involving more than 10,000 users exhibiting highlydynamic arrival and departure patterns, we were able todocument monetary savings of up to two orders of magni-tudes for our system.

100

10-1

100

101

102

103

104

105

D

Server rate α [piece/sec]

Fig. 1. Download time as a function of the content server rate.

2. Cost-performance tradeoff

When designing a content server, one of the first prob-lem to solve is the bandwidth allocation, i.e., how many re-

sources are necessary to provide the content deliveryservice. In this Section we consider the tradeoff betweenthe bandwidth utilization by a content server to the aver-age delivery time perceived by a set of swarming users(clients).

To this aim we have developed a simple Markovianmodel that provides some interesting insights. Since thedetails of the model are not necessary for understandingthe design of our system, we refer the interested readerto Appendix A for a complete description of the model.Here we report only the results with some sample inputs.

Fig. 1 quantifies the tradeoff between the server band-width utilization (i.e., the average upload rate a of the con-tent server) and the average delivery rate to clientsinvolved in a swarm with upload capacity k – the parame-ter l represents the departure rate, i.e., the rate at whichthe content disappears, in contrast to k, which can be inter-preted as the rate at which the content is replicated (forthe details see Appendix A).

The figure shows three operating regions. The first re-gion (left-side of the plot) is when a tends to zero, resultingin piece starvation, and a corresponding increase in down-load time. The second operating region (right-side of theplot) is when a tends to values that far exceed k, resultingin a client–server-like mode of operation. The third andmore interesting operating region is an intermediate one,within which an increase in a does not result in a corre-sponding decrease in download time. The ‘‘width’’ of thisregion depends on the health of the swarm, which is afunction of the content popularity captured by the clientdeparture rate l, and the mean client upload bandwidth k.

The behavior described by our model suggests the exis-tence of a quiescent operating point (at the transition be-tween the first and second operating regions depicted inFig. 1), beyond which the marginal utility from additionalbandwidth utilization is negligible. A content server oper-ating around this quiescent point would be fully leveragingthe uplink bandwidth of its clients, while minimizing itsown cost: operating below this quiescent point would jeopar-dize performance, and operating above this quiescent pointwould be cost inefficient. Note that previous models, suchas the one proposed in [5], show that the download timedepends linearly from the server bandwidth, while our

Page 3: Peer-assisted content distribution on a budget

2040 P. Michiardi et al. / Computer Networks 56 (2012) 2038–2048

model highlights the existence of different operatingregions.

We are now ready to describe the design and prototypeimplementation of a content server that uses a feedbacksignal to adjust its bandwidth contribution to the swarmso as to remain in the vicinity of a nominal quiescent oper-ating point. It is important to notice that the prototype wepresent in the following is not an explicit implementationof the model: the design is inspired by the key observa-tions made above.

While our framework allows for many combinations ofobjectives and feedback signals, in the remainder of thispaper we focus on the objective of maximizing the perfor-mance per unit cost, using the availability of content in theswarm as the feedback signal.

3. System design and implementation

We now present the design of CYCLOPS, our peer-assistedcontent delivery service. As depicted in Fig. 2, our CYCLOPS

service consists of a content server and a swarm monitor.The swarm monitor interprets the signaling messages ex-changed between swarming clients, and generates a feed-back signal that enables the content server to gauge themarginal utility of its contribution to the swarm. The con-tent server participates in the swarming protocol to satisfyclient requests, but only feeds the swarm when its contri-bution is deemed necessary (based on the feedback signal).In CYCLOPS, the swarm feeding rate is set to maximize theswarm performance-per-unit-cost, using the availabilityof content in the swarm as the feedback signal. As estab-lished in our model in Section 2, the quiescent operatingpoint for this objective is the minimum rate that avoidsswarm starvation.

CYCLOPS is conceived to work with any swarm-basedapplication/protocol that features (1) a coordinating entitythat tracks all swarm participants, enabling them to estab-lish peer-to-peer connections; (2) content that is dividedinto pieces to be distributed/exchanged independently;and (3) a control messaging scheme used by swarm partic-ipants to advertise piece availability.

For practical reasons, we present our system and con-duct our experiments focusing on a single content server,

ContentServer

MonitoringTool

FeedbackSignal

Content Delivery Service Internet

Swarm

Control feed

Data feed

content serverbandwidth

Fig. 2. Overview of CYCLOPS architecture: The content server and swarmmonitor reside in the CDN in distinct virtual machines, with bandwidthused for data feed (to the swarm) and control feed (from the swarm).

used to deliver a single content (file) to a set of clients.Problems related to concurrent swarms are orthogonal toour approach, and the solutions proposed in the literature,e.g., [6], can be integrated independently. Similarly, issuesrelated to the efficiency of the distribution process, solvedusing approaches based on traffic locality, are complemen-tary to our solution, and previous work on this topic, e.g.,[7,8], can be incorporated seamlessly.

CYCLOPS was conceived and implemented as a service thatcan be deployed on any content delivery platform. As anillustrative example, in this work we focused on a‘‘cloud-based’’ service: we used the Amazon Elastic Com-pute Cloud (EC2) environment, and produced an AmazonMachine Image (AMI) that supports both the content serverand the swarm monitor functionalities.

The source code of CYCLOPS can be found in [9].

3.1. The CYCLOPS swarm monitor

Swarm monitoring in CYCLOPS is achieved using a set ofcomponents that we called the On-line Feedback (OF)nodes. OF nodes connect to a live swarm, but neitherdownload nor upload content: they monitor all clients inthe swarm and collect signaling messages they exchange.Using this information, OF nodes construct snapshots intime that characterize the health/performance of theswarm. In our particular implementation, these snapshotsare used to derive the instantaneous piece availability,which constitutes the feedback signal fed to the CYCLOPS

content server using a complementary protocol.To ensure scalability (and seamless elasticity of the ser-

vice), we adopted a distributed design for OF nodes, where-by new clients joining the swarm are assigned to differentOF node to balance load. Accordingly, a swarm S is parti-tioned into Np non-overlapping sets, where Np is the num-ber of OF nodes in the system. Swarm partitioning isachieved using consistent hashing [10]: each OF node isresponsible for a fraction of the key-space, defined by theclient ID (e.g., IP address).

3.2. The CYCLOPS content server

The main objective of our approach is to minimize ser-ver bandwidth consumption without running the risk ofstarving the swarm. Based on the feedback signal providedby the swarm monitor, the content server feeds the swarmonly when necessary, i.e., when piece availability falls be-low a desirable threshold. To that end, in our design weadopted an ON/OFF control strategy, whereby the contentserver operation oscillates between two states: servingand idle.

When in the serving state, the content server dedicatesits full uplink capacity to serve missing pieces of content.By design, the server avoids injecting duplicate pieces intothe swarm. The rationale for doing so is that pieces can bequickly replicated by the swarm participants themselves.All clients connected to the content server are induced torequest the set of missing pieces, which constitute theserving set maintained by the content server: this ispossible since the server masquerades as a set of virtualclients holding a fraction of all available pieces. This

Page 4: Peer-assisted content distribution on a budget

P. Michiardi et al. / Computer Networks 56 (2012) 2038–2048 2041

serving set is partitioned into k non-overlapping subsetsthat are announced as ‘‘available’’. For instance, if the serv-ing set consists of pieces {1,2,3,4} and k = 2, then k mes-sages each announcing pieces {1,2} and {3,4},respectively, will be sent to k users that will eventually is-sue download requests. Once a piece has been served, it isremoved from the serving set, provided that the swarmmonitor has confirmed the presence of the piece in theswarm. When the server has finished injecting all missingpieces into the swarm, it transitions to the idle state.

When in the idle state, the content server simply closesall connections to remote clients, and refuses any incomingconnection. The content server remains in the idle stateuntil the feedback signal triggers a transition to the servingstate.

4. Experimental methodology

In this Section, we summarize the specifics of the CYCLOPS

instance we have experimented with, along with variousdetails regarding its illustrative deployment on a commer-cial cloud provider. We also describe the different types ofexperiments we have conducted, both in a controlled envi-ronment (involving PlanetLab clients under our control),and in the wild (involving thousands of real Internet usersaccessing content we advertised and made available).

4.1. BitTorrent-based swarming

As we alluded to in Section 3, CYCLOPS can be instantiatedto work with any swarm-based content distribution proto-col, supporting a specific set of features. For experimentalpurposes, we created an instance of CYCLOPS that is compat-ible with the popular BitTorrent (BT) client. Note that, in allour experiments, clients execute unmodified BT code. Thischoice is partly motivated by the wide adoption of BT byInternet users, as well as its adoption by many contentdelivery services (including Amazon S3 and many others[11]) as an underlying swarming protocol. The details ofthe BT protocol and algorithms are not essential to under-standing CYCLOPS, thus we refer interested readers to [12] fora technical description of BT. Here we only mention thatthe coordinating entity that maintains the list of clientsin the swarm is called the tracker, and that the two controlmessages used by BT to advertise pieces available at a cli-ent are the have and the bitfield messages: they indi-cate the availability at a client of a specific (single) piece,and of a set of pieces, respectively [12].

In the remainder of this paper, we use open-loop-BT torefer to a standard peer-assisted content delivery systembased on BitTorrent, whereas we use CYCLOPS to refer toour ‘‘feedback-controlled’’ content delivery service.

4.2. Deployment details

In our experiments, we used Amazon’s Elastic Comput-ing Cloud (EC2) to host, on separate virtual machines, theopen-loop-BT content server (called the seed), the tracker,and CYCLOPS (including the content server and the swarmmonitor). To mitigate the negative impacts on networking

performance due to shared resources (CPU and I/O) in avirtualized environment, we used large EC2 instances,which were all located in a single US-based data center.Our open-loop-BT and CYCLOPS content servers were well-provisioned, with an upload capacity of 2.4 Mbps. Notethat, in our experiments, a single OF node proved to be suf-ficient to monitor the entire swarm fed by CYCLOPS.

As for the threshold used to trigger from the idle state tothe serving state, in order to let the system work using theminimum amount of bandwidth, we have set it to onepiece.

4.3. Flash crowd experiments

To emulate a flash crowd arrival process – representa-tive of a scenario in which a large number of clients exhibita sudden interest in some specific content – we deployed aset of clients on PlanetLab machines, whereby all clientsinitiate their requests as a result of a centralized trigger:clients start downloading the content within 1 min of thattrigger signal. Once a user is done downloading the contentit continues to serve other clients until the end of theexperiment. We conducted our experiments using twoflash crowd sizes of L = 50 and L = 300 clients, respectively.In order to minimize the resource utilization of PlanetLabnodes, we used a homogeneous configuration with anapplication level cap of 160 Kbps for the client’s uplinkcapacity, which is the default setting for BT. The contentsize was set to 50 MB.

4.4. Waves of arrivals experiments

We synthesized extreme swarm dynamics on Planet-Lab, with the goal of studying CYCLOPS under stress: in prac-tice, we created a scenario in which availability problemswould hinder the content distribution process, requiringCYCLOPS to intervene more often than in a real swarm. Thedynamics consisted of three successive bursts of clientarrivals: a first burst of 100 clients arrive in a 10-min spanand leave after completing their download (within 50 minof arrival); a second burst of 100 clients join the swarm justbefore the mass exodus of the first wave of users. This pro-cess is then repeated for a third burst of arrivals. The inter-val between the mass exodus from one wave and the burstof arrivals from the next wave is set up in such a way thatthere would not be sufficient time for content pieces topropagate fully from the clients of one wave to the next(which should cause the swarm monitor’s feedback signalto trigger the CYCLOPS content server to rev up its contribu-tion to the swarm). As before, the client’s uplink capacitywas capped at 160 Kbps, and the content size was set to50 MB.

4.5. Live internet experiments

We conducted experiments to evaluate our systemunder realistic CDN operating conditions, includingweb-driven arrival and departure processes for usersdrawn from a diverse set of ISPs and with diverse softwaresettings. To do so, we distributed a non-copyrighted moviepacked in a 350 MB file. We created two distinct torrent

Page 5: Peer-assisted content distribution on a budget

2042 P. Michiardi et al. / Computer Networks 56 (2012) 2038–2048

meta-files (one for distribution using CYCLOPS and the otherfor distribution using open-loop-BT), and we publicizedboth simultaneously on popular content search web-sites.In these experiments, both the CYCLOPS and the open-loop-BT content servers had no cap on their uplink capacity (be-yond what is possible through a large EC2 instance), andneedless to say, we had no control on the settings (or eventhe BT variants) of the clients.

We note that, due to their very nature, live Internetexperiments cannot be repeated: as such, they should beregarded as illustrative deployment scenarios involvingreal Internet users that exhibit un-controlled and un-con-ditioned behavior.

4.6. Performance Metrics

In all of our experiments, we considered two main per-formance metrics. From the content server perspective, wemeasured the aggregate volume of data uploaded duringan experiment, i.e., the server bandwidth utilization. Sincecontent servers are under our control, we can measuretheir bandwidth utilization using local log files. From theclient side, we measured the content delivery times. ForPlanetLab experiments, we did that by collecting applica-tion-level logs from the clients. For live experiments,where we do not have access to client logs, we measuredthe content delivery times using our swarm monitor,which aggregates information provided by OF nodes. Theaccuracy of this approach was validated using the Planet-Lab experiments: we compared the download times com-puted using individual log files (of PlanetLab clients) tothose obtained from OF nodes, and verified the match be-tween the empirical cumulative distribution functions ofdownload times for the two methodologies. Furthermore,to assert the statistical significance of our results, ourPlanetLab experiments were performed five times for eachconfiguration.

5. Experimental results

5.1. Flash crowd experiments

End-users’ performance in downloading content is ex-pressed in terms of individual download times. Fig. 3 re-

0

10

20

30

40

50

BT CYCLOPS BT CYCLOPS

Tim

e [m

in]

L = 50 peers L = 300 peers

Fig. 3. Flash crowd: content download times (file size: 50 MB).

ports the most important percentiles (25th, 50th and75th) of the empirical cumulative distribution function(ECDF) of download times.

As a general trend, we observe that the median down-load time of open-loop-BT swarms is lower than that of CY-

CLOPS swarms, with the gap reduced in larger swarms. Thereason lies in the fact that an open-loop-BT seed keepsfeeding the swarm during the whole experiment, resultingin a larger fraction of users receiving data from the contentserver itself (which is faster than the user), and hence theshorter content delivery time. Furthermore, we note thataside from visible but relatively small variations, thedownload time for CYCLOPS clients was less sensitive to theswarm size.

The above explanation is further confirmed by the re-sults in Table 1, which reports the average bandwidth uti-lization expressed in volume of data served by both theCYCLOPS and the open-loop-BT content servers, normalizedby content size. An open-loop-BT seed injects the swarmwith 10–15 times the size of the original content, whereasCYCLOPS feeds the swarm only when necessary, which giventhe static nature of this experiment is once. These resultscorroborate the intuition discussed in Section 2. A contentserver that can gauge the marginal utility of its contribu-tion to a swarm can settle in the vicinity of an operatingpoint in which an additional expense of server bandwidthresources has a marginal effect on the swarm performance.

5.2. Waves of arrivals

Fig. 4 shows the key percentiles of the empirical cumu-lative distribution function (ECDF) for the delivery timesexperienced by clients in the successive waves of arrivals.In this case, the difference between the delivery timesachieved by CYCLOPS and the open-loop-BT content serversis small: the median value of the distribution indicatesan advantage of roughly 15% in favor of the latter.

Table 2 shows the average volume of data served byboth schemes, as well as information on traffic overhead(namely, volume of control messages involving bandwidthresources). For CYCLOPS, we show the aggregate overhead in-curred by the content server and the swarm monitor. Forcompleteness, we report the feedback traffic exchangedbetween the content server and OF node, noting that thesemessages are exchanged within the confines of the cloudand hence do not entail additional costs.

The data in Table 2 corroborates our conclusion that CY-

CLOPS achieves low server bandwidth utilization, even whenthe system is artificially stressed by complex clientdynamics.

Next we examine the evolution in time of the feedbacksignal (namely, system-wide piece availability) generatedby the CYCLOPS swarm monitor and the content server state

Table 1Flash crowd: average server load (file size: 50 MB).

BT CYCLOPS

L = 50 12.2 1L = 300 15.36 1

Page 6: Peer-assisted content distribution on a budget

0

10

20

30

40

50

BT CYCLOPS

Tim

e [m

in]

Fig. 4. Waves of arrivals: content download times (file size: 50 MB).

Table 2Waves of arrivals: server load and overhead (file size: 50 MB).

BT CYCLOPS

Normalized server load 39.86 1.5Outgoing overhead (KB) 55 52Incoming overhead (KB) 2560 716Feedback overhead (KB) – 145

P. Michiardi et al. / Computer Networks 56 (2012) 2038–2048 2043

transitions it triggers. Let M be the number of pieces intowhich a file is divided, and let I(i, t), i = 1, . . . , M be the indi-cator function for piece i at time t, i.e., I(i, t) = 1 if there is atleast one copy of piece i at time t, otherwise I(i, t) = 0. Theavailability feedback signal A(t) at time t is computed as:

AðtÞ ¼P

iIði; tÞM

ð1Þ

Fig. 5 shows the time-series for the swarm size, the avail-ability feedback signal, and the content server state transi-

Fig. 5. Waves of arrivals: availability over time.

tions induced by this signal. It shows that as soon as thefeedback signal indicates piece starvation (i.e., availabilityis less than 1), the content server switches to the servingstate and feeds the swarm. Piece availability is zero whenthe swarm bootstraps, and drops whenever clients holdingthe unique copy of a particular piece depart from the sys-tem. The content server switches from the idle state to theserving state only when necessary to restore piece avail-ability to 1. Note that in this experiment we have purpose-fully created an extreme case of swarm dynamics: in a realswarm, user behavior is not as synchronous.

5.3. Live internet experiments

In the set of experiments we present in this Section, wedo not control the client arrival and departure processes,but rather we let these processes reflect the popularity ofthe content we disseminate. Furthermore, clients partici-pating in our swarms exhibit realistic uplink and downlinkcapacities, unlike our PlanetLab experiments in which allclients have the same uplink capacity.

For CYCLOPS, out of a total of 7633 users we tracked, 3509obtained the full content. All other users departed beforefinishing the download process. For the open-loop-BT con-tent server, 2486 out of a total of 5044 users completed thecontent download. Despite the diversity in the user basefor each experiment, we stress the relevance of a live Inter-net content distribution: our goal is to show that CYCLOPS

copes well with an heterogeneous environment, with un-controlled clients and that it offers measurable benefitsin an illustrative cloud-based scenario.

Fig. 6 depicts the instantaneous number of users forboth swarms. In our experiments, after the transients ofthe first few hours have subsided, the user arrival anddeparture rates within each swarm equalized, withapproximately 35–40 users joining each swarm perminute.

Fig. 7 shows the content delivery times (with the mostimportant percentiles) achieved by all users that were ableto complete the download. These results indicate that, inour experiments, the median delivery time achieved byboth content servers is very similar. For the CYCLOPS contentserver, the ECDF indicates longer tails: this is mainly due toa larger swarm size, which included clients with poorInternet connectivity. From the end-users’ perspective,

0

100

200

300

400

500

600

700

0 500 1000 1500 2000

swar

m s

ize

[pee

rs]

Time [minutes]

CYCLOPSBT

Fig. 6. Live experiment: Swarm size over time.

Page 7: Peer-assisted content distribution on a budget

0

50

100

150

200

250

BT CYCLOPS

Tim

e [m

in]

Fig. 7. Live experiment: content download times (file size: 357.5 MB).

2044 P. Michiardi et al. / Computer Networks 56 (2012) 2038–2048

the difference in the download performance when they areserved by CYCLOPS or by open-loop-BT is negligible.

The content server bandwidth utilization, the associ-ated volume of data and related costs supported by contentservers underscore the superiority of CYCLOPS. Table 3 indi-cates that the CYCLOPS content server served a total of731.6 MB of content data, while the open-loop-BT seed in-jected a whopping 133.03 GB of content data! Table 3 alsoreports the overhead traffic, as defined in the previousSection.

These results support our conclusion that the frame-work discussed in Section 2 and the particular instancewe presented in this work are viable candidates for realInternet content distribution systems. Note that bothexperiments lasted 38 h, and that the swarm sizes allowedus to assume equivalent uplink capacity distributions forusers in each torrent.

Since we deployed our content servers on Amazon EC2instances, we were able to quantify the economic value ofour proposed scheme: For the experiment we carried out,the total cost (including overheads) for distributing thesame content when using a legacy BT seed is roughly 180times higher that of a CYCLOPS content server.

6. Additional considerations

We now discuss several points that complement thework presented in this paper.

6.1. Dealing with alternative objectives and feedback signals

The framework proposed in this work is general en-ough to allow many possible combinations of objectivesand feedback signals. For example, an alternative objec-

Table 3Live experiment: service statistics (file size: 357.5 MB).

BT CYCLOPS

Total number of users observed in the swarm 5044 7633Normalized server load 381.04 2.05Outgoing overhead (MB) 6.5 0.2Incoming overhead (MB) 160.8 24.6

Cost of delivery $23.73 $0.13

tive may be to ensure some minimal level of servicebased on a feedback regarding the average content deliv-ery time. The swarm monitor described in Section 3 canreadily measure the average content delivery times,using the same swarm signaling traffic we discussed ear-lier. Indeed, clients advertise whenever they receive anew content piece, information that can be simply usedto compute the average download rate of the swarm.Based on this information, the content server can choosethe appropriate level of bandwidth (i.e., the cost it in-curs) to complement the serving capacity k of theswarm, with the constraint of remaining in the vicinityof the quiescent operating point discussed in Section 2.With reference to Fig. 1, this approach corresponds to acontent server selecting to contribute bandwidth re-sources that move across the various operating regionsobtained for different values of k.

6.2. Dealing with alternative ways to collect feedback signals

The swarm monitor described in Section 3 is achievedusing a set of OF nodes that connect to all users. Weshow in Section 5 that the cost of this solution, in termsof overheads, is not significant. Nevertheless, maintainingmany connections may pose some challenges. An alter-native solution is to use periodic sampling of the swarmstate: The OF nodes, instead of connecting to all theusers in the swarm, periodically obtain a subset of usersfrom the tracker and connect temporarily to this subsetto collect the information about pieces owned by theusers. Using sampling statistics, it is possible to infer sys-tem-wide piece availability, subject to preset levels ofconfidence. Clearly, the larger the sampling set, the moreprecise the availability information: In practice, approxi-mating data availability may yield higher server load,since pieces may not be detected even if they are inthe swarm.

6.3. Dealing with multiple content servers

In this paper, we conducted experiments in which a sin-gle content server is deployed. There are many obviousreasons to consider a more general scenario involving mul-tiple content servers. For example, a CDN operator maywish to use CYCLOPS on edge servers positioned in multiplelocations so as to serve clients efficiently: in this scenario,end-users might be directed to their geographically closestCYCLOPS content server. Traffic locality to mitigate the im-pact on ISPs economics, calls for a technique to create dis-tinct swarms. This can be achieved with techniquesproposed in the literature without requiring any modifica-tion to the design of CYCLOPS. Alternatively, multiple CYCLOPS

servers could be combined to contribute to the sameswarm. In this case, such content servers would have tocoordinate what content pieces they serve and when toavoid inefficiencies. Our current implementation does nothave provisions for avoiding the overlap between the serv-ing sets compiled by different content servers. That said,standard distributed algorithms could be easily used tomanage such situations for production-scale systems.

Page 8: Peer-assisted content distribution on a budget

P. Michiardi et al. / Computer Networks 56 (2012) 2038–2048 2045

6.4. Dealing with cloud services cost models

In this work we have illustrated the benefits, in terms ofbandwidth consumption, of our approach for an instance ofa ‘‘cloud-based’’ content delivery service. Despite CYCLOPS isan appropriate approach to general peer-assisted contentdelivery networks, here we discuss more on cloud-basedservices. Since we focused on a simple objective, i.e., tokeep a swarm alive, CYCLOPS exhibits an intermittent behav-ior: the system leaves the idle state only when the swarmrisks starvation. Under this operational mode, it is naturalto question whether this behavior matches current costmodels that apply to resources rented in the Cloud. Forexample, the granularity for paying an Amazon’s ElasticComputing Cloud (EC2) instance is one hour. As such,although bandwidth resources are not used nor payed forwhen CYCLOPS is in the idle state (probing traffic aside), thevirtual machine is payed independently of the bandwidthconsumption.

With respect to the above discussion, we stress that theexperimental setting we used in this work is not intendedto be deployed in production. Ideally, our system is suit-able for a deployment with the Amazon Simple StorageService (S3) and its CDN extension (called CloudFront).Alternatively, our approach is suitable for more elaboratescenarios in which multiple contents are distributed viamultiple instances of CYCLOPS that coexist in the same virtualmachine. In such case, the unit cost of the virtual machinecan be amortized by having CYCLOPS content servers coordi-nate: when one CYCLOPS instance is idle, another instancecan be in the serving state, if necessary. Note that suchcoordination is not trivial: the intermittent behavior of CY-

CLOPS is a result of swarm dynamics, which cannot be con-trolled. It is outside the scope of this work to extend theCYCLOPS architecture to cope with such additionalcomplexity.

For the cost of distributing the content, it is worthnoticing that the proposed evaluation does not take intoaccount possible volume discounts, which may decreasethe difference between a pure CDN and our solution. Inany case, we believe that the impact of this aspect shouldbe minimal.

6.5. Dealing with adversarial workloads

Denial of Service (DoS) attacks as well as other impro-per behavior of end-users aiming to exploit swarm re-sources is a concern that has to be considered whenembracing a peer-assisted CDN solution such as ours.Although this is an important problem to address, herewe focus on deliberate attacks by a client (or a set of col-luding clients) targeting the specifics of our CYCLOPS frame-work. Other types of attacks typical of P2P systems, suchas Sybil or Eclipse attacks, can be solved using the tech-niques already presented in the literature[13]. We recog-nize two possible adversarial exploits, where the aim isto pollute the feedback signal computed by the CYCLOPS

swarm monitor.In the first, an adversary may seek to consume as much

server bandwidth as possible. This can be done by inducingthe content server to detect piece starvation (when none

truly exists), thus causing the server to wastefully injectcontent. Since CYCLOPS swarm monitor tracks all clients ina swarm, such an attack would require a colluding set ofmalicious users of a size approximately equal to the wholeswarm size, which can be safely assumed impractical.

In the second, a set of colluding users may engage in aDoS-like attack to hinder content distribution, by inducingthe content server to conclude that the swarm is healthy(when the contrary is true). This causes starvation of legit-imate clients. This can be solved by letting the swarmmonitor to compute the average download rate of theswarm. Based on this information, in case of content star-vation, the swarm monitor may trigger an alarm, indicat-ing, for instance, the less replicated pieces.

An alternative issue related to our scheme is the singlepoint of failure represented by the server: if the server be-comes unavailable, the content may disappear. In this case,any solution based on server replication (e.g., in a hot-standby configuration, or with multiple online servers asdiscussed before) would be sufficient to solve this problem.

7. Related work

7.1. Peer-assistance

Peer assisted content distribution have been the subjectof many recent studies. Of these, the work of Huang, Wang,and Ross [14] could be seen as similar in nature to thework presented in this paper. In that work, the authorsadvocate the use of peer-assisted content distribution byevaluating the potential gain from peer-assisted video dis-tribution using real-world traces of two large CDN compa-nies, Akamai and Limelight (the underlying architecture ofboth of which they characterized). Their approach uses themodel in [15] to obtain bounds on the server load anddownload times, should swarming among end-users be al-lowed. They also quantify the potential reduction in ISPpeering traffic, resulting from traffic localization. In thesame vein, our work is based on an analytical model thatgives key insights as to the benefits of peer-assisted con-tent distribution (although, our focus is on bulk as opposedto video transfers). Beyond a ‘‘proof of concept’’ using atractable mathematical formulation, we go one step fur-ther by presenting practical feedback-control contentinjection policies that aim to satisfy performance objec-tives while minimizing provider’s costs. Our implementa-tion is evaluated in realistic contexts, and our results gobeyond a purely theoretic estimation of the benefits ofpeer-assisted content distribution.

Liu et al. [16] propose a peer-assisted file distributionsystem (FS2You) which has a similar architecture to oursystem. Besides the fact that FS2You and CYCLOPS have beendesigned approximately at the same time, our system isbased on BitTorrent protocol, which is readily available tomany clients, while FS2You requires the installation of aproprietary client.

7.2. Frugal seeding

To the best of our knowledge, the only work that has asimilar objective to ours – in terms of reducing the load/

Page 9: Peer-assisted content distribution on a budget

2046 P. Michiardi et al. / Computer Networks 56 (2012) 2038–2048

cost on a content source, albeit in a very different setting –is Sanderson and Zappala’s work [17]. In that work, oncethe seed has determined a subset of pieces that shouldbe injected in a swarm, it will satisfy any number of re-quests for those pieces. As a consequence, their techniquedoes not offer the same level of control on the seed work-load as the policies we study in this work. Indeed, we ob-serve that for experiments carried out in similar settings,our content servers inject orders of magnitude less trafficthan what was documented in [17]. Additionally, our sys-tem does not require any parameter to be empirically set.

Chen et al. [18] study the ‘‘SuperSeeding’’ mode intro-duced by an alternative BT client to help peers with slowInternet connections perform initial content seeding. Theobjectives of ‘‘SuperSeeding’’ are different from ours. More-over, a number of problems due to multiple peers using‘‘SuperSeeding’’ have been reported. The work in [19] pro-poses a ‘‘Smartseed’’ policy, which advocates serving justone copy of each piece. Besides the fact that Smartseeddoes not take into account dynamic scenarios, it requiresthe modification of clients, while our system involveschanges only to the server with no modification to theclient.

7.3. Models and bounds

The literature is rich with analytical models that dissectmany aspects of P2P content distribution. In [20] and [21],the authors derive lower bounds for the minimum contentdistribution time of a swarm-based P2P application: webuild upon those works, but focus instead on the relationbetween the content server upload rate and the downloadrate achieved by peers. The work in [22] belongs to thefamily of fluid models of BitTorrent-like applications: how-ever, in this model it is the number of peers (as opposed totraffic) in the system that is taken as fluid. The authors in[22] develop a differential equation for the fluid model,from which they determine the performance of the dy-namic system. We also model content replication in a dy-namic setting, but instead consider the number of piecereplicas as the dynamic variable modeled using a Markovprocess.

7.4. Bandwidth allocation in P2P systems

While the study of alternative mechanisms that im-prove the bandwidth allocation in P2P systems is orthogo-nal to our work, results from such studies could clearlyhave positive implications on content server utilization.In [6], the authors design a content distribution systemwith the objective of maximizing the download rate of allparticipants in a managed swarm. The system design in[6] is based on a wire protocol that induces peer participa-tion (using virtual currency) to achieve a global systemoptimization. In our work, we focus on a different objec-tive: we try and address the question of whether it is pos-sible to optimize the bandwidth utilization by contentservers, without negatively impacting the performanceperceived by clients. We note that the model we use in thiswork can also explain, though in more general terms, thekey intuition behind the Antfarm work [6].

The problem of devising efficient uplink allocation algo-rithms for swarm-based P2P bulk data transfers is ad-dressed in [23]. Instead of using empirically setparameters, as done in BT, to determine the amount of up-link capacity dedicated to each remote connection, theycast uplink allocation as a fractional knapsack problem,and design a simple heuristic utility function to decidethe amount of bandwidth a peer should dedicate to eachremote connection. The focus of their work is on a cooper-ative P2P setting, in which peers are assumed to fully abideto the prescribed algorithms.

8. Conclusion

In this paper, we have demonstrated that peer-assistedcontent distribution could be leveraged to supplant as op-posed to supplement the content provider’s resources forpurposes of efficient and scalable content distribution,without negatively impacting the performance perceivedby clients. Our approach is based on a feedback-controlledswarm feeding mechanism, which we have modeled ana-lytically and evaluated empirically using CYCLOPS – a full-fledged service that we have implemented and deployedboth in controlled and un-controlled environments.

Our extensive experimental results – including the livedistribution of content to thousands of real Internet users– show that CYCLOPS achieves enormous cost savings forthe content provider (as high as two orders of magnitudewhen compared to non-feedback-controlled BitTorrent-based services) without noticeably impacting the perfor-mance perceived by end-users. We were able to show thatthe mechanisms we developed as part of this work have aclear impact on content distribution economics, includingsignificant reduction of costs for content providers, andmuch more efficient resource utilization for content hostsand distributors.

Our on-going work is focused on exploring alternativeobjectives and alternative feedback signaling processes inCYCLOPS, as well as extensions that take into account multi-ple (possibly competing) content servers involved in thedistribution of content from multiple sources.

Acknowledgments

This research was supported in part by NSF awards#0720604, #0735974, #0820138, #0952145, and#1012798 and has been partially supported by the FrenchANR-VERSO project VIPEER.

Appendix A

In this Section we develop a model that relates band-width utilization by a content server in the CDN to theaverage delivery time perceived by a set of swarming users(clients).

A.1. Model

We consider a dynamic environment, where clients joina swarm, download the content, and eventually leave the

Page 10: Peer-assisted content distribution on a budget

P. Michiardi et al. / Computer Networks 56 (2012) 2038–2048 2047

system. The number of clients in the swarm is not known apriori, but it can be characterized by arrival and departurerates. These rates may fluctuate drastically and such fluc-tuations are typical for ‘‘hot’’ viral Internet content, whichgets published, gains significant popularity fairly quickly,but eventually dies off over time. In this work, we assumethat for the content download timescale (say minutes)they remain constant, allowing the system to reach a stea-dy state in which the arrival and departure rates equalize,and consequently the average number of clients in theswarm is constant.

Let N be the steady-state average number of clients inthe swarm, and let the content be divided into M indepen-dent pieces. If M� 1 then a client holds M/2 pieces onaverage. For analytical tractability, we do not model net-work bottlenecks or losses.

Consider a birth–death Markov chain whose state sk

represents k, the number of replicas of a single (arbi-trary) piece of content. Note that one can envision anidentical, independently evolving Markov chain for eachone of the M pieces that make up the content. For a gen-eric state sk, there are two possible transitions: (1) eitherthe piece is replicated, resulting in a piece birth, and thusa transition from state sk to state sk+1, or (2) a clientholding a replica of the piece leaves the swarm and is re-placed by a new client that does not have the piece,resulting in a piece death, and thus a transition fromstate sk to state sk�1.

Let ak indicate the average rate at which the contentserver injects a piece in the swarm at state sk. Let k denotethe piece replenishment rate resulting from client contri-butions: k is computed by dividing the aggregate uploadcapacity of all N clients by the total number of pieces M.Both ak and k are expressed in pieces per second.

For sake of simplicity, we assume a random piece repli-cation strategy: in contrast to more sophisticated replica-tion strategies [24], random piece selection simplifiesanalysis and provides conservative performance bounds.Thus, the probability of choosing to replicate the particularpiece (modeled by the Markov chain) out of the M/2 piecesavailable at the client, is 2/M. The probability that no clientwill choose to replicate that piece is (1 � 2/M)k, since k isthe number of clients holding the piece in state sk. Thisyields a probability of 1 � (1 � 2/M)k for going from statesk to state sk+1.

To compute the transition rate from state sk to statesk+1 we must also account for the rate ak at which thecontent server independently injects the piece into theswarm. This yields a transition rate of k � (1 � (1 � 2/M)k) + ak. Notice that state s0 is a special state in whichonly the content server can inject the piece. Thus, thetransition rate from state s0 to state s1 is equal to theserver upload rate a0.

Let l denote the client departure rate (measured in cli-ents per second). The probability of a death out of state sk isthe probability that any one of the k clients holding thepiece leaves the swarm. The probability that a given depar-ture is by one of these k users is k/N. Thus, the transitionrate from state sk to state sk�1 is given by lk/N.

In summary, the transition rates from state sk to statesk0 , denoted by sk;k0 , can be expressed as follows:

sk;k0 ¼

a0 if k ¼ 0 and k0 ¼ 1k � ð1� ð1� 2=MÞkÞ þ ak if k0 ¼ kþ 1; 0 < k < N

lk=N if k0 ¼ k� 1; 0 < k 6 N

0 otherwise

8>>><>>>:

ðA:1Þ

Note that, since the Markov chain is finite (N + 1 states),the steady state solution exists.

We now compute the probability p0 to be in state s0. Forsimplicity, we consider the case in which the content ser-ver uploads a piece at an average rate ak = a, "k, irrespec-tively of its state; by solving the Markov chain we get:

p0 ¼ 1þ al

Nð1þUÞ� ��1

ðA:2Þ

where

U ¼PNk¼2

Nl

� �k�1 1k!

Qk�1

i¼1k 1� 1� 2

M

� �i !

þ a

" #

We now proceed to finding the relationship betweenthe average server rate a and the mean download time.Each client obtains 1/N of the swarm’s upload capacity,which is M(k + a). Since the content is composed of Mpieces, the mean download time can be computed asT = M/(M(k + a)/N) = N/(k + a). This is true as long as theprobability of being in state s0 is small enough. If this prob-ability increases, then we have an additional term for themean time spent in state s0: this can be computed by mul-tiplying the probability of state s0 (p0) by the time spent instate s0 (1/a). Hence, the mean download time is boundedby:

T 6N

kþ aþ p0

aðA:3Þ

To illustrate the utility of this model, consider a swarmof N = 100 clients downloading content consisting ofM = 2000 pieces, with a client departure rate of l = 0.5 cli-ents per second, and a mean client upload rate ofk = {1.5l,3l,6l} piece/s. Fig. 1 in Section 2 shows the aver-age download time as a function of the server upload rate,as predicated by Eq. A.3.

For the particular settings used in Fig. 1, the intermedi-ate region is given by a 2 [10,1000] piece/s.

References

[1] Akamai, 2011. <http://www.akamai.com>.[2] Amazon AWS, 2011. <http://aws.amazon.com>.[3] Limelight, 2011. <http://www.limelightnetworks.com>.[4] BitTorrent, 2011. <http://www.bittorrent.com>.[5] F. Liu, Y. Sun, B. Li, B. Li, Quota: Rationing server resources in peer-

assisted online hosting systems, in: Proc. of IEEE ICNP, 2009.[6] R.S. Peterson, E.G. Sirer, Antfarm: Efficient content distribution with

managed swarms, in: Proc. of USENIX NSDI, 2009.[7] D.R. Choffnes, F.E. Bustamante, Taming the torrent: A practical

approach to reducing cross-isp traffic in P2P systems, in: Proc. ofACM SIGCOMM, 2008.

[8] R. Cuevas, N. Laoutaris, X. Yang, G. Siganos, P. Rodriguez, Deep divinginto bittorrent locality, Tech. rep., Telefonica Research, 2009,arxiv.org/abs/0907.3874.

[9] Cyclops, 2011. <http://www.eurecom.fr/michiard/downloads/cyclops.tar.gz>.

[10] D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, D. Lewin,Consistent hashing and random trees: distributed caching protocols

Page 11: Peer-assisted content distribution on a budget

2048 P. Michiardi et al. / Computer Networks 56 (2012) 2038–2048

for relieving hot spots on the world wide web, in: Proc. of ACM STOC,1997.

[11] BitTorrent Protocol, 2011. <http://en.wikipedia.org/wiki/BitTorrent_(protocol)>.

[12] A. Legout, G. Urvoy-Keller, P. Michiardi, Rarest first and choke areenough, in: Proc. of ACM IMC, 2006.

[13] M. Steiner, E.W. Biersack, T. En-Najjary, Exploiting KAD: possibleuses and misuses, Computer Communication Review 37 (5) (2007).

[14] C. Huang, J. Li, A. Wang, K. Ross, Understanding hybrid CDN-P2P:why limelight needs its own red swoosh, in: Proc. of ACM NOSSDAV,2008.

[15] C. Huang, J. Li, K. Ross, Can internet VoD be profitable?, in: Proc. ofACM SIGCOMM, 2007.

[16] F. Liu, Y. Sun, B. Li, B. Li, X. Zhang, FS2You: peer-assisted semi-persistent online hosting at a large scale, IEEE Transactions onParallel and Distributed Systems 21 (10) (2010) 1442–1457.

[17] B. Sanderson, D. Zappala, Reducing source load in bittorrent, in: Proc.of IEEE ICCCN, 2009.

[18] Z. Chen, Y. Chen, C. Lin, V. Nivargi, P. Cao, Experimental analysis ofsuper-seeding in bittorrent, in: Proc. of IEEE ICC, 2008.

[19] A.R. Bharambe, C. Herley, V.N. Padmanabhan, Analyzing andimproving a bittorrent networks performance mechanisms, in:Proc. of IEEE INFOCOM, 2006.

[20] R. Kumar, K. Ross, Optimal peer-assisted file distribution: single andmulti-class problems, in: Proc. of IEEE HOTWEB, 2006.

[21] R. Sweha, A. Bestavros, J. Byers, Angels – In-Network Support forMinimum Distribution Time in P2P Overlays, Tech. Rep. BUCS-TR-2009-003, Boston University, 2009.

[22] D. Qiu, R. Srikant, Modeling and performance analysis of bittorrent-like peer-to-peer networks, in: Proc. of ACM SIGCOMM, 2004.

[23] N. Laoutaris, D. Carra, P. Michardi, Uplink allocation beyond choke/unchoke or how to divide and conquer best, in: Proc. of ACMCONEXT, 2008.

[24] F. Bin, D.-M. Chiu, J.C. Lui, Stochastic analysis and file availabilityenhancement for bt-like file sharing systems, in: Proc. of IEEEIWQoS, 2006.

Pietro Michiardi received his M.S. in Com-puter Science from EURECOM and his M.S. inElectrical Engineering from Politecnico diTorino. Pietro received his Ph.D. in ComputerScience from Telecom ParisTech (formerENST, Paris). Today, Pietro is an AssistantProfessor of Computer Science at EURECOM.Pietro currently leads the Distributed SystemGroup, which blends theory and systemresearch focusing on large-scale distributedsystems (including data processing and datastorage), and scalable algorithm design to

mine massive amounts of data. Additional research interests are on sys-tem, algorithmic, and performance evaluation aspects of computer net-works and distributed systems.

Damiano Carra received his Laurea in Tele-communication Engineering from Politecnicodi Milano, and his Ph.D. in Computer Sciencefrom University of Trento. He is currently anAssistant Professor in the Computer ScienceDepartment at University of Verona. Hisresearch interests include modeling and per-formance evaluation of peer-to-peer networksand distributed systems.

Francesco Albanese received the B.A. and theM.Sc. degrees in computer science fromPolitecnico di Torino in 2006 and 2009,respectively. His interests include computernetworks, peer-assisted technologies anddistributed databases.

Azer Bestavros received the PhD degree incomputer science from Harvard University in1992. He is a professor and former chairmanof computer science at Boston University. Heis the chair of the IEEE Computer Society TCon the Internet, and the Founding Director ofthe BU Hariri Institute for Computing, whichwas set up in 2010 to ‘‘create and sustain acommunity of scholars who believe in thetransformative potential of computationalperspectives’’. He is the recipient of the 2010United Methodist Scholar Teacher Award in

recognition of ‘‘outstanding dedication and contributions to the learningarts and to the institution’’ and of the ACM Sigmetrics Inaugural Test ofTime Award for research ‘‘whose impact is still felt 10–15 years after its

initial publication’’. His research contributions include pioneering thepush web content distribution model adopted years later by CDNs,seminal work on Internet and web characterization, and work on formalverification of networks and systems. Funded by over $18M of grantsfrom government agencies and industrial labs, his research yielded 14PhD theses, 4 issued patents, 2 startup companies, and hundreds of ref-ereed papers that are cited over 6,000 times. His current research projectsfocus on mechanism design for efficient and secure cloud computing.