Peer-assisted Content Distribution in Akamai NetSession

  • View

  • Download

Embed Size (px)

Text of Peer-assisted Content Distribution in Akamai NetSession

  • Peer-Assisted Content Distributionin Akamai NetSession

    Mingchen Zhao Paarijaat Aditya Ang Chen Yin Lin?

    Andreas Haeberlen Peter Druschel Bruce Maggs? Bill Wishon Miroslav PonecUniversity of Pennsylvania MPI-SWS ?Duke University Akamai Technologies

    ABSTRACTContent distribution systems have traditionally adopted oneof two architectures: infrastructure-based content deliv-ery networks (CDNs), in which clients download contentfrom dedicated, centrally managed servers, and peer-to-peer CDNs, in which clients download content from eachother. The advantages and disadvantages of each architec-ture have been studied in great detail. Recently, hybrid,or peer-assisted, CDNs have emerged, which combine ele-ments from both architectures. The properties of such sys-tems, however, are not as well understood.

    In this paper, we discuss the potential risks and ben-efits of peer-assisted CDNs, and we study one specific in-stance, Akamais NetSession system, to examine the im-pact of these risks and benefits in practice. NetSessionis a mature system that has been operating commerciallysince 2010 and currently has more than 25 million usersin 239 countries and territories. Our results show thatNetSession can deliver several of the key benefits of bothinfrastructure-based and peer-to-peer CDNsfor instance,it can offload 7080% of the traffic to the peers with-out a corresponding loss of performance or reliabilityand that the risks can be managed well. This sug-gests that hybrid designs may be an attractive option forfuture CDNs.

    Categories and Subject DescriptorsC.2.4 [Computer Systems Organization]: Computer-Communication NetworksDistributed Systems; C.4[Computer Systems Organization]: Performance ofSystems

    KeywordsContent distribution networks, peer-to-peer systems

    1. INTRODUCTIONThis paper presents a study of a hybrid CDN called Net-Session in which most content is delivered by a set of peerswhose operation is coordinated (and backstopped) by a ded-icated infrastructure. Our study is motivated by the obser-

    Permission to make digital or hard copies of part or all of this work for personalor classroom use is granted without fee provided that copies are not made ordistributed for profit or commercial advantage, and that copies bear this noticeand the full citation on the first page. Copyrights for third-party componentsof this work must be honored. For all other uses, contact the owners/authors.Copyright is held by the authors/owners.IMC13, October 2325, 2013, Barcelona, Spain.ACM 978-1-4503-1953-9/13/10.

    vation that hybrid CDNs seem to have reconciled two ar-chitectures with very different tradeoffs: On the one hand,peer-to-peer systems are inexpensive and easy to scale butseem to be plagued by security issues and low quality of ser-vice (QoS); on the other hand, infrastructure-based systemsare expensive to set up and scale but can provide predictableQoS and reliable accounting, and ensure content integrity.We felt that the time was ripe to investigate what advan-tages a hybrid of these approaches might convey.

    We find, perhaps surprisingly, that NetSession is able toachieve the best of both worlds: it can offer most of thebenefits of both architectures while avoiding most of thedrawbacks. This is because the strengths of the infrastruc-ture and the peers complement each other: The infrastruc-ture provides a central point of coordination that can quicklymatch up peers and can add resources when peers cannotprovide adequate QoS; the peers provide resources and scal-ability, and they extend the reach of the infrastructure tounderserved areas.

    Prior work has identified several potential concerns abouthybrid CDNs; in particular, it has been suggested thathybrid CDNs might become a burden to Internet ServiceProviders (ISPs) by increasing inter-AS traffic [17, 24, 35].We show that, at least in the case of NetSession, this con-cern appears to be unfounded. Finally, we report some ob-servations from the day-to-day operation of the NetSessionsystem. Among other things, we discover a surprising de-gree of user mobility in the system, and we describe whatappear to be the effects of cloning and re-imaging client in-stallations. In summary, this paper makes the following fivecontributions:

    an overview of the hybrid CDN design space, includinga discussion of potential risks and benefits (Section 2);

    a description of NetSessions design goals and imple-mentation (Section 3);

    a measurement study that provides an overview of thescale at which NetSession operates and the types ofcontent it is used to deliver (Section 4);

    an analysis of whether NetSession meets its designgoals and realizes the potential benefits of hybrid sys-tems (Section 5); and

    an investigation into whether the risks associated withhybrid systems, such as undue burdens on ISPs orproblems with untrusted end-user machines, are evi-denced by NetSession in practice (Section 6).

    We discuss related work in Section 7, and present our con-clusions in Section 8.


  • 2. THE CDN DESIGN SPACEWe begin by sketching the design space for content distri-bution systems, and discuss the potential risks and benefitsof hybrid designs.

    2.1 To Peer or Not To Peer?Traditionally, CDNs have fallen into two categories: some re-lied exclusively on a centrally managed infrastructure, whileothers operated entirely without one. Akamais CDN [10]is an example of an infrastructure-based CDN: it uses aninfrastructure of more than 137,000 servers in 87 countrieswithin over 1,150 Internet networks, all owned and operatedby Akamai. BitTorrent [8] is an example of a peer-to-peer(p2p) CDN: other than the tracker (which serves as an initialpoint of contact) and perhaps an initial seeder, it requires noinfrastructure at all; all data is exchanged directly betweenthe peers. Recent BitTorrent clients have even replaced thetracker with a distributed hashtable [9], removing the lastremaining infrastructure element.

    Infrastructure-based CDNs typically have the bene-fit of professional administrators and amply provisionedresourcesthey can control and authenticate the contentthey distribute, and they can achieve high performance andreliability, but they are also expensive to scale. Peer-to-peerCDNs must rely on resources contributed by their peers,which means that their properties are often the exact op-posite: they require little up-front investment and scale or-ganically, but can be plagued by spam and low-quality con-tent, and their quality of service tends to be unpredictable.Thus, it is natural to ask whether it might not be possibleto achieve the best of both worlds in a single system.

    2.2 Why decide?In this paper, we focus on hybrid CDNs that combine ele-ments from both architectures: like peer-to-peer CDNs, theyachieve scalability by leveraging resources from the clientmachines, but, like infrastructure-based CDNs, they alsorely on dedicated, centrally managed elements, such as acentral directory of file locations and dedicated servers thatprovide a backstop of delivery capacity. We refer to suchCDNs as peer-assisted CDNs.

    One way to characterize peer-assisted CDN designs isto look at the role they assign to the infrastructure. Forinstance, the infrastructure can provide resources, such asbandwidth, computation, or storage, which might be usedto improve the quality of service or to keep rarely requestedcontent available; it can provide coordination by building aglobal view of the system, e.g., to recognize attacks, to allo-cate resources, or to pair up peers with matching NATs; andit can provide control, e.g., by deciding which content maybe distributed. Thus, there is an entire spectrum of possibledesigns, ranging from systems that rely mostly on the peersand have a very small infrastructure (such as tracker-basedBitTorrent) to systems that rely primarily on the infrastruc-ture and use the peers only as an optimization.

    2.3 Potential benefitsPeer-assisted CDNs have substantial flexibility in dividingtasks between the peers and the infrastructure. Thus, theycan potentially use each element where it is strongest. Thepotential benefits relative to a pure peer-to-peer CDN in-clude:

    Better quality of service: The infrastructure can be usedas a backstop for the peers, i.e., it can help out with re-quests where the peers do not have much bandwidth, orwhere there is a lot of churn.Reduced reliance on peer contributions: The infras-tructure can absorb the cost of a certain degree of freeload-ing. Thus, it is not as important to force peers to recipro-cate, or to fairly distribute the workload.More reliable delivery: The infrastructure can authenti-cate content; there is no need to rely on heuristics or rep-utations to recognize content that has been corrupted ortampered with.Less legal exposure: The infrastructure can ensure thatall content is properly licensed; thus, peers can safely par-ticipate in swarming and upload content to others, withoutbeing at risk for accidental violations of the DMCA (andsimilar laws).Better security: The infrastructure can use its global viewto recognize and fight attacks, and it can provide profession-ally managed servers that can perform trusted functions.Higher efficiency: The infrastructure can maintain aglobal view and perform global optimizations, e.g., byquickly locating copies of a file, or by matching up peers.

    The potential benefits over a pure infrastructure-basedsystem include:Lower cost: Content providers can potentially deliver theirdata at lower cost (and can pass some of the resulting savingson to end users). Also, the CDN operator may be able tooffer services to content providers who otherwise may notbe able to afford a CDN.Global coverage: Finally, in geographic areas with sparseinfrastructure deployment, serving content from peers mayincrease the quality of se