Hybrid Network Coding Peer-to-Peer Content Distribution

Embed Size (px)

Citation preview

  • 7/30/2019 Hybrid Network Coding Peer-to-Peer Content Distribution

    1/10

    Hybrid Network Coding Peer-to-Peer ContentDistribution

    Dinh Nguyen and Hidenori Nakazato

    Abstract Network coding has been applied successfully in peer-to-peer systems to shorten the distribution time. Pieces of

    data, i.e. blocks, are combined, i.e. encoded, by the sending peers before being forwarded to other peers. Even though

    requiring all peers to encode might achieve shortest distribution time, it is not necessarily optimal in terms of computational

    resource consumption. Short finish time, in many cases, can be achieved with just a subset of carefully chosen peers. Peer-to-

    peer systems, in addition, tend to be heterogeneous in which some peers, such as hand-held devices, would not have the

    required capacity to encode. We therefore envision a P2P system where some peers encode to improve distribution time and

    other peers, due to limited computational capacity or due to some system-wide optimization, do not encode. Such a system

    gives rise to a design problem which has never happened in both pure non-coding and full network coding-enabled P2P

    systems. We identify the problem and propose our solutions to address it. Simulation evaluation confirms robust performance of

    our proposed hybrid network coding peer-to-peer content distribution.

    Index Terms

    content distribution, network coding, peer-to-peer

    1 INTRODUCTION

    ETWORK coding [1][13], wh ich allows content to be

    coded at intermediate nodes w hile being forward ed

    in the network, has been shown to achieve signifi-

    cantly shorter distribution time in peer-to-peer (P2P) con-

    tent distribution [4][16]. It is, however, too expensive and

    in many cases impossible to require encoding at every

    peer . Recen t w ork h as dem onstrated th at encod ing is on ly

    needed at a su bset of carefully chosen p eers, and in some

    particu lar instances, on ly at th e sou rce, to achieve comp a-

    rable performance to network coding [11][12][15]. Many

    other studies have focused on minimizing the number of

    required network coders to achieve optimal multicast

    throughput [8][21][22]. P2P networks in reality usually

    consist of heterogeneous peers with quite d ifferent cap a-

    bilities. More pow erful peers can be read y for network

    coding-enabled operations, yet such jobs are beyond the

    capacity of resource-limited peers like hand-held and

    mobile devices. A successful network coding solution to

    optimize P2P network performance, therefore, cannot

    impose encoding at every network n ode.

    Interested in u sing network coding to shorten distribu-tion time in P2P network, we envision a P2P system

    where encoding is applied at some peers while other

    peer s, due to resource limitat ion or due to op tim iza tion

    reasons, might not code. The system, which we call a hy-

    brid network coding P2P system, gives rise to a design

    problem which has never hap pened before. In pure Bit-

    Torrent P2P system [3], the source and all peers exchange

    pieces, i.e. b locks, of th e file using r arest-block select ion to

    quickly disseminate the file into the system. A peer

    chooses the rarest blocks in the neighborhood to dow n-

    load first. In full network coding-enabled P2P, all peers

    code. Before downloading from a neighbor, a peer com-

    municates with the neighbor to determine if it can pro-

    vide with new data. When some peers encode and others

    do n ot, there are mixtures of coded and non -coded blocks

    in the neighborhood for each peer to choose from. The

    question is how we can design a protocol and a block-

    selection algorithm to hand le such a mixture of coded and

    non-coded blocks and at the sam e time preserve the effi-

    ciency and simplicity of BitTorrent p rotocol.

    In this paper, we design our hybrid network coding

    P2P system. Our contributions are follows.

    1) We devise information exchange protocols whichenable hybrid network coding systems to work

    seamlessly. Our design, backward-compatible to

    BitTorrent, requires only an addition of one field in

    the meta-exchange messages.

    2) We propose a block-selection algorithm for a partlynetwork coding-enabled system to operate efficient-

    ly. Our block-selection algorithm, an extension from

    BitTorrents rarest-first selection, is derived from ex-

    tensive observations of the way netw ork coded d ata

    benefit content d istribution.Our design and algorithm noticeably improve system

    performance in term s of distr ibu tion tim e com pared with

    current netw ork coding P2P systems.

    In the remaining parts, we review related w ork in sec-

    tion 2. The system model is given in section 3. We de-

    scribe the protocol for peers to communicate in section 4.

    Section 5 focuses on the block-selection problem and our

    proposed algorith m. Section 6 presen ts performance

    evaluation results and finally, we conclude the paper in

    section 7.

    2 RELATED WORKBitTorrent [3], a popular P2P file sharing with parallel

    dow nloads to accelerate dow nload sp eed, divides the file

    into equal-size blocks, i.e. chunks, pieces, which peers

    The authors are with Waseda Univ ersity , Tokyo 169-0051, Japan. This work has been extended from a previous paper[14].

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 4, APRIL 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 8

  • 7/30/2019 Hybrid Network Coding Peer-to-Peer Content Distribution

    2/10

    send an d receive in parallel, utilizing both available u p-

    load and download bandwidth. Each newly joining peer

    connects to a set of random existing peers, such that to

    construct a mesh overlay network with random topolo-

    gies. Furth ermore, rarest blocks are chosen first by receiv -

    ing peers to quickly disseminate the whole file into the

    system. To encourage peers to contribute uploading

    bandwidth to th e system, a p eer uploads to, i.e . unch okes,

    a certain number of neighboring peers at a time, those

    provide it with best dow nloading ra tes. Rarest first block

    selection and unchoking are shown to be the reasons un-

    derlying BitTorrent excellent p erform ance [5].

    Network cod ing [1][2][13], which allows inter med iate

    nodes to encode, have been app lied to BitTorrent in or-

    der to shorten distribution time [4][16]. Whenever there is

    an opportunity to transmit, a peer combines all blocks it

    has to make new coded blocks and sends to the request-

    ing peer.

    For full-scale network coding P2P where all peers en-

    code, [4][6] proposes a mechanism by which beforedow nloading from a neighbor, a peer checks if the neigh-

    bor can provide it with meaningfu l blocks, i.e. blocks

    wh ich are linearly indepen dent from its own set of blocks.

    We call that a try-and-download approach which, com-

    pared to BitTorrent, requires a major update in the way

    peers exchange metad ata:

    1. a peer send s a requ est message to its neighbor ,

    2. the neighbor replies either with a newly generated

    encoding vectoror with its decoding matrix1,

    3. requesting peer dow nloads a newly coded block

    from the neighbor if the encoding vector or the

    neighbors decoding matrix is independent fromits own d ecoding m atrix.

    Try-and-download is synchronous in the sense that a

    peer has to be in synch w ith its neighbors by contin uou sly

    checking if they can pr ovide it with n ew d ata. Moreover,

    a receiving peer cannot know in advance exactly which

    and how many blocks it is going to receive from each

    neighbor to make a better choice. Such knowledge will

    help the receiving peers to decide which blocks are most

    valuab le to it.

    In full-scale network coding systems where peers are

    somehow homogeneous in terms of computational re-

    sources to encode, try-and-download is feasible, yet with a

    protocol overhead . Within a hybrid network cod ing P2P

    system, however, it is not necessarily that all peers can

    code. In such a scenario, requiring a resource-limited peer

    to frequently compare its own d ecoding matrix with d e-

    coding matrices of its neighbors is beyond its capacity.

    We need a simple, yet effective, way to do that which

    every peer, encoding-enabled or n ot, can d o.

    To facilitate hybrid network coding P 2P systems w h ere

    encoding and non-encoding nodes mix together, we de-

    part from try-and-download approach to introduce an ex-

    tension to BitTorrent metadata exchange. We furthermore

    propose a block selection algorith m to improve dist ribu-

    tion time. Our prop osed solution is backward-comp atiblewith BitTorrent and virtually requires no more protocol

    1 We explain encoding vector and decoding matr ix in se ction 3.2.

    overhead than pure BitTorrent, yet the performance im-

    provement is noticeable com pared with original ne tw ork

    coding P2P systems.

    3 SYSTEM MODEL

    3.1 Hybrid Network Coding Peer-to-Peer System

    We consider a P2P content distribution from a source to

    many peers in w hich each peer m aintains overlay connec-

    tions to some random peers, i.e. its neighbors, to ex-

    change data.

    A file exists at the source and is distributed to all peers

    which, at the beginning, do not have any part of the file.

    The file is originally divided in to Kequal blocks, the same

    as in [3], which are transferred in the system in parallel.

    To accelerate throughput, some peers in the system en-

    code while other peers do not. Since coded data exist in

    the system, all peers which have received coded data,

    however, are required to decode to recover the original

    data.As in BitTorrent systems [3][4], block exchange com-

    plies with tw o ru les: (1) ra rest block first selection at th e

    receivers side: receivers choose rarest b locks in its neigh-

    borh ood to dow nload, and (2) a n incentive schem e at the

    senders side: senders send blocks to their neighbors re-

    ciprocally.

    3.2 Random Linear Network Coding

    Encoders in our system use random linear network cod-

    ing (RLNC) [2][6] to create new coded blocks from the

    blocks they have received . In RLNC, an encoding vectorof

    K coefficients is attached to each coded block to specifyhow that coded block is generated from the K original

    blocks. Su ppose we have a cod ed block C0 with encoding

    vector (c01,c02,...,c0K), and K original blocks, B1, B2, ...,BK.

    That means C0= c01B1+ c02B2+...+ c0KBK. The coefficients and

    mu ltiplication and add ition op erations are taken place in

    a Galois field, e.g. GF(28).

    Now su ppose a peer A, hav ing received 2 blocks C1

    and C2, wants to make a new coded block to send to a

    neighboring peer. Peer A w ill pick u p tw o rand om coeffi-

    cients a1 and a2 and generate a new coded block C:

    C = a1C1+ a2C2

    = (a1 c11+ a2 c21)B1+(a1c12+ a2c22)B2+...+(a1c1K+ a 2c2K)BK.The coded block C together w ith Kcoefficients abov e is

    sent to the requesting peer.

    At the receiving peer, all encoding v ectors are stored in

    a decoding matrix with corresponding coded data blocks.

    After a peer collects Kindependent coded blocks, i.e. the

    Kassociated encoding vectors form a full-rank matrix, it

    can decode to get the Koriginal blocks by solving the set

    ofKlinear equations.

    3.3 Network Coder Assignment

    Since only some peers in the system encode, the questions

    are which peers will become network coder and who is

    responsible for assigning them .

    In our view, peers at key locations of the network can

    selectively be assigned as netw ork coders. We discuss this

    problem in detail in [15] where we propose to place net-

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 4, APRIL 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 9

  • 7/30/2019 Hybrid Network Coding Peer-to-Peer Content Distribution

    3/10

    work coders at nodes with high centrality [17][18] values.

    This approach, however, requires a centralized server to

    compute and assign coders. Practically, in P2P systems

    such as BitTorrent [3][4][19], we can allow trackers to do

    that task since the trackers know which peers currently

    join the torrents.

    Network cod ers can also be assigned in a dist ribu ted

    manner without any centralized server by using, for ex-

    amp le, degree information.2 Given a th reshold, peers with

    degrees higher than the given value will become encod-

    ers.

    In scenarios where computational resources are lim-

    ited, we can approximately predict the amount of re-

    quired resou rces based on w hich a p eer can d etermine, by

    itself, to become an encoder if it meets the resource re-

    quirements. Such an encoder assignment does not need a

    centralized server either.

    4 INFORMATION EXCHANGE PROTOCOLIn pu re BitTorrent w ithout netw ork coding, there are two

    pha ses to distribu te blocks. These tw o phases interlace

    and take place asynchronously.

    Notification phase: after downloading a block, thedownloading peer notifies its neighbors about the

    block it has just d ow nloaded .

    Selection phase: whenever bandwidth is availablefor downloading, a peer, based on the information

    it has about which blocks are available in the

    neighborhood, chooses one block to download us-

    ing a block selection algorithm. The download,

    then, can proceed if the downloading peer is cur-

    rently unchocked by its neighbor who has the cho-

    sen block and the neighbor has enough bandwidth

    to sustain such download. If that fails, the peer can

    repeat this process to choose another block. This

    phase stop s when th e peer ru ns ou t of bandwid th

    or has no more blocks to choose from.

    In the following subsections, we concentrate on the

    protocols used to com municate betw een peers and th e

    format of the exchanged metadata. We discuss the block

    selection algorithm in d etail in the n ext section. BitTorrent

    unchocking algorithm is one topic in itself to handle fair-

    ness and free-rider issues and is not discussed in this p a-per. W e instead assu me peer s in ou r system are alt ru ist ic

    and w illing to contribute their bandwidth .

    4.1 Block Format

    To identify data blocks, each block is associated with one

    unique block-id. However, one extension is needed to

    support network coding. Unlike non-coding systems in

    wh ich the assignment is d one by the sou rce where all the

    blocks or iginate, in network cod ing P2P system s, th at

    assignment is done w here the block is created or originat-

    ed: both at the source and at all the encoders. To assist

    2 Degree-based routing has been proposed in [20]. In this paper, how-ever, we do not p ropose any network coder placement but d evote our-selves to designing the protocol and data selection algorithm for thehybrid network coding system.

    our block selection algorithm, the block-id is generated in

    increasing order: a new block-id generated by a p articular

    encoder is greater than all pr evious block-ids generated by

    that encoder.With network coding, an encoding vector is attached to

    each coded block as described in the previous section.

    We propose an additional encoder-id field (Fig. 1) which

    stores the identification of the encoder who generated the

    coded block.Encoder-idwill be used in our selection algo-

    rithm later on.

    For each block, the metadata exchanged between

    neighbors in a notification message, thus, consists of three

    fields: block-id, its encoder-id, and its encoding vector

    (Fig. 1(a)). The d ata block consists of block-id, encoder-id,

    and the payload data (Fig. 1(b)).If the notification or d ata

    block is a non -cod ed one, its encoding vectoran d encoder-idcan be omitted.

    Having d efined the block formats, we next present d e-

    tails of two communication protocols, either of which can

    be used in th e hybrid ne tw ork coding system.

    Fig. 1. Notification and data block formats with thenewly proposed encoder-idfield.

    Fig. 2. Pre-code protocol peers used to communicate. There are

    two asynchronous phases: notification phase and selection phase.This protocol is an extension from BitTorrent: the notification mes-sages and data blocks have an additional encoder-id. Encodingvectors are also attached to the notification messages as de-scribed in section 3.2.

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 4, APRIL 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 10

  • 7/30/2019 Hybrid Network Coding Peer-to-Peer Content Distribution

    4/10

    Pre-code protocol: encoding vectors of coded blocksare generated in the n otification phase w hen encod -

    ers notify their neighbor about n ewly coded blocks.

    Post-code protocol: encoding vector for a given codedblock is generated in th e selection phase, just before

    the block is down loaded.

    We discuss the pros and cons of those two protocolssubsequently.

    4.2 Pre-code Protocol

    Without the assumption that every peer can code, we

    propose a sim ple ad ap tation to BitTorrent metad ata ex-

    change mechanism. To facilitate coding, in our system, if

    a peer is an encoder, for each newly downloaded block,

    the peer notifies each of its neighbors with metadata of

    one newly encoded block. The newly encoded block is

    different from one neighbor to another neighbor. We note

    that to save computational resources, only the metadata,

    i.e. encoder-id, block-id, and the newly generated encodingvectorof the encoded block (Fig. 1(a)), are notified to the

    neighbors in a notification message. Only when a neigh-

    bor decid es to chooses the notified cod ed block is the ac-

    tual data of that block encoded. For an ordinary non-

    encoding peer, the metadata exchange is the same as in

    BitTorrent: the peer notifies its neighbors of the block it

    has just received. The communication protocol is illus-

    trated in Fig. 2. Since the system is a hybrid network cod-

    ing, notifications (message 1) and data blocks (message 3)

    transferred between p eers can be either encoded or origi-

    nal ones.

    One might argue to use try-and-downloadhere, but that

    will make the operation more complicated because each

    peer has to implem ent tw o protocols: one for encod ing-

    enabled neighbors, one for ordinary neighbors. With our

    app roach, all a peer h as to d o is to choose from candidate

    blocks one par ticu lar block to dow nload based on the

    metadata it received in notification phase, which is the

    same as what h appens in a p ure BitTorrent system.

    When a peer receives notification of a newly encoded

    block by a n eighbor , i.e. message 1 in Fig. 2, the peer stores

    that block in a candidate list if the block is independent

    from all blocks it has downloaded. Otherwise, it ignores

    the notification. Unlike encoding-enabled peers, non-

    encoding peers do not encode but forward what theyhave received: a m ixture of coded and non-coded blocks.

    As in BitTorrent, when receiving notification from a non-

    encoding neighbor, a peer will update the count of that

    block, i.e. at how many n eighbors the block exists.

    When a p eer can d ownload, it sends a block request to

    the correspond ing neighbor (message 2), and if the request

    is accepted, the n eighbor will upload th e data block to the

    requesting peer (message 3).

    Coding generates a large number of coded blocks,

    usually larger than the number of original blocks, of

    which many blocks are redundant. As a peer continuous-

    ly downloads new blocks, some blocks in its candidatelist might become depend ent on wh at it has dow nloaded.

    Each p eer is therefore required to check and discard can-

    didate blocks which are dependent on what has been

    downloaded.

    4.3 Post-code Protocol

    As we mentioned before, notification phase and selection

    phase are asynchr on ou s. That is, after peer A notifies an

    encoded block in message 1, some amount of time passes

    before peer B requ ests th at encod ed block in message 2.

    The elapsed time can arbitrarily be long if, for example,

    peer B decid es to dow nload several blocks from other

    neighbors before choosing the encoded block from peerA. In the meantime, peer A might receive some new

    blocks. Using pre-cod e protocol, that new information is

    not includ ed in the encoded block since the way the block

    is generated , i.e. its encoding vector, was fixed at the n oti-

    fication t ime.

    Encoders combine the blocks they currently have to

    make new coded blocks. If we can delay the act of encod-

    ing just before the coded blocks are dow nloaded, w e can

    provid e th e receiv ing peers with the most updated infor -

    mation. Based on th e above observation, we pr oposed an

    Fig. 3. Usingpost-code protocol, the encoding vector is generatednot in the notification phase but just before the requested block issent to the receiving peer.

    Fig. 4. Node E and F receive notification from node A, G,and H about the candidate blocks the two nodes can down-load. Of which B1 and B2 are non-coded blocks from node Gand H; A1-A4 are newly encoded blocks from encoder A.

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 4, APRIL 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 11

  • 7/30/2019 Hybrid Network Coding Peer-to-Peer Content Distribution

    5/10

    alternative protocol, namely post-code protocol, which is

    illustrated in Fig. 3.

    The differences of the post-code protocol from pre-

    code pr otocol are as follows.

    Encoding vector is not included in the notification

    message, i.e. message 1 in Fig. 3. Only encoder-idand

    block-id are notified to the neighbor (peer B) each

    time peer A downloads a new block. As stated be-

    fore, encoder-idis the ID of peer A and block-id is an

    increasing nu mber generated by peer A.

    Fig. 5. With original rarest-first selection, there is aprobability 1/8 that node E and F choose the sameblock B1 or B2. The result is that node T can onlydownload one new block while its bandwidth allowstwo blocks.

    Fig. 7. Encoder A, having 2 blocks B1 and B2, notifiesnode E and node F with newly encoded blocks A1-A4.

    Fig. 6. If coded blocks from encoder A are preferred,node E and F can always download independentblocks. As a result, node T can utilize all its bandwidthto download 2 new independent blocks.

    Fig. 10. If the newest blocks are preferred, the 4 blocksdownloaded by node E and node F are independent, whichmeans node T can download in total 4 independent blocksin 2 units of time.

    Fig. 8. Encoder A, after downloading 2 new blocks B3 andB4, notifies node E and node F with blocks A5-A8 encodedfrom all 4 blocks B1-B4.

    Fig. 9. With original rarest-first selection, there is aprobability 1/9 that node E chooses block A3 andnode F chooses block A4. The result is node T canonly download 2 independent blocks A1 and A2 in 2units of time. Blocks A3 and A4 in node E and node Fare not useful to node T because they are dependenton A1 and A2.

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 4, APRIL 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 12

  • 7/30/2019 Hybrid Network Coding Peer-to-Peer Content Distribution

    6/10

    The encoder (peer A) actually generates the encodingvector and sends to the receiving peer in the selec-

    tion phase (message 3) just before the actual coded

    data (message 5). The receiving peer (peer B) needs

    to check if that encoding vector is independent from

    its own decoding matrix before requesting th e encod-

    ed block (message 4).

    Post-code protocol has the advantage of producing

    fresher coded blocks which expectedly accelerate content

    distribution. The limitation, however, is that it requires

    more protocol overhead: in total 5 messages for each

    downloaded block compared with 3 messages in case of

    pre-cod e p rotocol.

    5 BLOCK SELECTION PROBLEM

    In this section, we describe in detail the block selection

    problem associated with hybrid ne tw ork cod ing sy stem s

    and prop ose our solution for it. The p roposed block selec-

    tion, which can be u sed w ith either of the two pr otocols in

    the last section, completes our proposal for an efficient,

    high-performance P2P content distribution with network

    coding. We begin by describing the duplication problem

    in such a system using the original rarest-first block selec-

    tion.

    5.1 Duplication Problem with Current Rarest-firstBlock Selection

    The block selection algorithm used by BitTorrent is rarest -

    first by which peers choose the rarest block in the neigh-

    borh ood to dow nload first . If there are several rarest

    blocks, a random one is selected from th ose rarest blocks 3.Rarest-first selection is not enough because of two rea-

    sons.

    1) Encoders combine more information in the neigh-borh ood. Wh en th ere is lim ited available band-

    width, for example when a bottleneck exists, non-

    coded blocks and coded blocks cannot be given the

    same attention. Coded blocks from the encoders

    should be preferred because they contain, in a

    sense, more information and can accelerate content

    distribution through the bottleneck.

    2) Coded blocks are n ot equally important. Each cod -ed block even though is always unique, i.e. rare, inthe sense that almost always no two coded blocks

    are identical, the level of importance of each coded

    block is d ifferent. Cod ed blocks are created progres-

    sively from all the blocks an encoder h as dow nload -

    ed. In the beginning, as there are only a few blocks

    to encode, the coded blocks created then contain

    within them only the information from that few

    blocks. The more blocks an encod er has, the more

    data are combined to create new coded blocks. Be-

    cause of that, only at the source or wh en an encoder

    has downloaded the full file, are the coded blocks

    3 In the beginning of the d istribution section w hen p eers have no blocksto exchange with others, BitTorrent uses random block selection bywhich peers choose a random block in the neighborhood to download.Neverthe less, after a peer ha s acquired som e b locks , it sw itches to ra restblock se lection.

    equally important. In other cases, the most recently

    coded blocks likely contain m ore information.

    To make it clear, we illustrate the problem in two fol-

    lowing examp les.

    Exampl e 1(Fig. 46) illustrates a partial overlay topol-

    ogy with 6 nodes: A, G, H, E, F, T of which A is the only

    encoder. Nodes A, G, H each has two blocks B1 and B2.Encoder A has notified node E, F with 4 blocks A 1-A4,

    each node with two newly coded blocks. Nodes G, H

    have n otified E, F with blocks B1 and B2. The count of each

    block in th e ne igh borh ood is given in th e tables (Fig. 4).

    Supp ose due to bottlenecks, E and F, each can only d own -

    load on e new block. If E and F select blocks using original

    rarest-first algorithm , there is 1/ 8 chance that both will

    dow nload the same block B1 or B2 which results in node T

    can only download one new block while its available

    bandwid th allows tw o (Fig. 5). In Fig. 6, if E and F p refer

    coded blocks from encoder A over other blocks, T can

    always down load two new blocks.

    Exampl e 2(Fig. 710) considers a partial overlay topol-

    ogy in which an encoder A is delivering coded blocks to

    non-coding nodes E, F, and T. At the beginning, A has

    two blocks B1 and B2, and notifies E and F of 4 newly en-

    coded blocks: A1-A4, two blocks for each node (Fig. 7).

    Nod e E an d nod e F th en each can dow nload one block,

    e.g. A1 and A2 du e to bandw idth limit. In the meantime, A

    downloads two more blocks: B3 and B4, and sends new

    notifications about blocks A 5-A8 to E and F (Fig. 8). If E

    and F select blocks using rarest-first, there is 1/ 9 chan ce

    that E chooses A3 and F chooses A4 wh ich resu lts in p eer

    T being only able to obtain 2 independent blocks in 2

    units of time (Fig. 9). In contrast, T can download 4 newblocks if E an d F prefer new encod ed blocks over old ones

    (Fig. 10).

    The problem th erefore is: given a mixtu re of coded and

    non-coded blocks in the neighborhood, which blocks

    should a peer choose to dow nload.

    5.2 Proposed Block Selection Algorithm

    Our prop osed algorithm is given in Algorithm 1. It works

    seamlessly in all types of networks: pure non-coding, full-

    scale network coding, and hybrid network coding. We

    ALGORITHM 1PROPOSED BLOCK SELECTION ALGORITHM

    Given a set of coded and non-coded blocks with

    their correspond ing nu mber of occurrence

    1. Sort blocks in descending order of their rareness(i.e. ascend ing ord er of their occurrence).

    2. Choose the rarest block for download . If thereare several blocks with the same rareness choose

    a block in the following ord er

    a. a block encoded by one neighbor, i.e. a codedblock which has encoder-id of the neighbor; if

    there are several coded blocks from a neighbor,

    prefer the b lock with largest block-id(most recent

    one)

    b. a block at random (coded or non-coded)

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 4, APRIL 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 13

  • 7/30/2019 Hybrid Network Coding Peer-to-Peer Content Distribution

    7/10

    extend rarest-first selection to give preference to coded

    blocks from immed iate neighbors over other ones (Alg o-

    rithm 1 2a). Also, from the same encoding neighbor,

    newer coded blocks are preferred over older ones. In d o-

    ing so, we allow valuable newly encoded blocks in the

    neighborhood to be quickly disseminated while preserv-

    ing the power of rarest-first in distributing new infor-

    mation. Our algorithm improvement is generally signifi-

    cant. Without it, newly coded blocks, virtually with m ore

    information, are arbitrarily blocked in the network be-

    cause neighboring peers may choose not to download

    them.

    6 PERFORMANCE EVALUATION

    We implemented a C++ simulator of the hybrid network

    coding P2P content d istribution system. We evaluate the

    proposed block-selection algorith m in sect ion 5 usin g

    either pre-code or post-code protocol in section 4 and

    comp are the performance with a baseline network codingBitTorrent system. The baseline system uses BitTorrents

    original rarest first block selection and the pre-code pro-

    tocol.

    A file is distributed from the source to all participating

    peers, among w hich a p reset nu mber of peer s are allow ed

    to encode. The file is divided into smaller fix-sized parts,

    i.e. blocks. The source and all peers exchange blocks until

    all peers acquire enough blocks to construct the original

    file; then th e simu lation finishes.

    The simulations are round-based. Each peer chooses

    blocks to dow nload accor ding to its available bandwid th ,

    rarest block first selection, and the incentive scheme at thebegin ning of each round. The chosen blocks are dow n-

    loaded by the peer at the end of the round and then the

    system moves to the next rou nd. After a p eer has collect-

    ed enou gh blocks, it stops dow nloading but keeps staying

    in the system to serve other peers. Each overlay link ca-

    pacity is measu red by block per round, i.e . how man y

    blocks can be tr ansfer red th rou gh th e link in a round. W e

    disregard the overhead of sending encoding coefficients

    associated with ran dom linear coding.

    To captu re the essence of the system, we assume a stat-

    ic scenario, i.e. there is no chan ge in the p hysical topology

    and the overlay topology during a content distribution

    session. The insight obtained from this static case is criti-

    cally important for future work which investigates the

    dynamic scenario.

    We implemented mu tual exchange incentive scheme in

    the simu lations: when there is contention for u ploading, a

    sending peer preferably uploads to the neighbors from

    wh om it is also downloading. After such peers are ex-

    hausted, other neighbors are chosen for u pload. This kind

    of incentive schemes ha s prev iously been used in [4].

    For a given network topology we run simulations 100

    times and collect the average finish time of all peers.

    6.1 Clustered TopologiesWe first evaluate performance in a simple topology of

    two clusters (Fig. 11). A midd le node i intercepts between

    the source and the clusters to simulate a situation where

    blocks are com ing progressively to nod e i. Within a clus-

    ter, peers are arranged in k-regular random topologies

    where kis from 3 to 6. Each cluster has 1000 nod es with 1block per round bandwidth betw een neighbors with in a

    cluster. Source bandw idth to node i is 8 blocks per round

    and from node i to each clust er is 4 blocks per round . The

    two clusters are connected by a link with a capacity of 1

    block/ round. The source delivers a 200-block file to all

    peer s.

    Fig. 12 compares the finish time of our system using

    the proposed block selection algorithm (with either pre-

    code or post-code protocol) and the finish time of the

    baseline system in th ree cases: no coding, coding at n ode i,

    and network coding. As we expect, no codingfinish time is

    the same for both systems. Finish time improvem ent of

    the proposed system becomes evident when node i codes

    (around 5%) and when all nodes code (around 10%). In

    this topology, the finish time is the same for both pre-

    code and post-code p rotocols.

    6.2 Small-world Network Topologies

    P2P networks have been shown to exhibit small-world

    properties [10]. We use Wa tts and Strogatz sm all-w orld

    network model [7] to generate more complex topologies

    with 5000 peers. By varying the small-world networks

    rewiring probability p we can change the network topol-

    ogy. We set capacity of all links to 1 block/ round , the

    degree k=6, and change the rewiring probability p in sim-ulations.

    We simu late two real-life scenarios.

    1) Optimization scenario: encoders are placed at se-

    Fig. 11. A 2-cluster topology with a middle node i.

    Fig. 12. Finish time of the proposed system compared withbaseline system in a clustered topology.

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 4, APRIL 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 14

  • 7/30/2019 Hybrid Network Coding Peer-to-Peer Content Distribution

    8/10

    lected p eers to minimized d istribution time. We use

    two placement method s:

    betweenness centrality placement [15]: nodes withhigh betw eenness centrality valu es [9] are chosen

    as encoders. Betweenness centrality measures the

    degree that a node lies in the shortest paths be-

    tween other nodes. Coding at high betweenness

    centrality peers can improve content distribution

    to more downstream peers4.

    degree-basedplacement: encod ers are placed atnodes w ith high degrees first.

    2) Resource-constraint scenario: nodes with highercapacity can encode, nodes having limited resources

    cannot. Among peers, we set some random ones

    with rich resources and assign them as encod ers.

    We increase the number of encoders from 0 (no coding)

    to 5000 (network coding) and compare the performance of

    the prop osed system w ith the baseline system in two sce-

    narios above. The results are given in Fig. 13, Fig. 14, and

    Fig. 15 with rewiring pr obabilityp=0.02.

    When betweenness centrality is used to optimize en-

    coder placement, the proposed block selection together

    with pre-code protocol shortens distribution time by

    about 15% compared to the baseline system with only 250

    encoders (Fig. 13). With more encoders, the improv emen t

    is higher and reaches more than 25% when all nodes en-codes. The finish time ofno coding, i.e. the number of en-

    coders is zero, is the same regard less of which block selec-

    tion algorithm and protocol are used.

    With 2000 or less encoders chosen at random, i.e. a set

    of random peers are allowed to encode, there is not mu ch

    finish time improvement using both rarest-first and the

    proposed block selection (Fig. 15). That is becau se a few

    encoders at random, without a proper placement, are not

    effective in improving distribution time. When a large

    number of encoders, e.g. 3000 or 4000 encoders, are ran-

    domly deployed, the proposed block selection with pre-

    code protocol can improve distribution time by around15% comp ared to ba seline system .

    The finish time using degree-based placement lies be-

    4 We discuss the strategy to place encoders in another w ork [15].

    tween th e other two p lacements. As before, the prop osed

    system achieves noticeable finish time improvement

    compared to the baseline system using rarest selection

    (Fig. 14).

    We next change the rewiring probability p from 0.02 to

    0.05 to evaluate our system in a wide range of topologies.

    Using 250 encoders among the total of 5000 peers, the

    finish time improvement compared with non-coding Bit-

    Torrent (with no encoder) is presented in Fig. 16, Fig. 17,

    and Fig. 18.

    Our p roposed system (proposed selection + pre-code and

    proposed selection + post-code) always achieves improved

    performance com pared with th e baseline system . The

    improvement, however, is more visible in topologies with

    low rewiring probabilities (p=0.02). High rewiring

    probabilities generate almost random topologies in which

    the effect of coding, in general, is not so noticeable.

    The performance of post-code protocol (proposed selec-

    tion + post-code) is better than pre-code protocol (proposed

    selection + pre-code) in low-rewiring topologiesp=0.02 (Fig.

    13-15). The reason is because with post-code protocol en-coders can combine more up dated inform ation to send to

    the receivers as we have discussed before. In topologies

    with higher rewiring p robabilities (p0.1) (Fig. 16, 17, and

    Fig. 13. Finish time of the proposed system when choosingnodes with highest betweenness centrality as encoderscompared with finish time of baseline system.

    Fig. 14. Finish time using the proposed system comparedwith a baseline system in case encoders are placed at high-degree peers

    Fig. 15. The performance of the proposed system comparedwith baseline system in resource-constraint scenario whenonly (random) high-capacity peers are allowed to encode.

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 4, APRIL 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 15

  • 7/30/2019 Hybrid Network Coding Peer-to-Peer Content Distribution

    9/10

    18), since new information can transfer through more re-

    wiring links, post-code protocol is no longer more effec-tive than pre-code protocol. We note that in the simula-

    tions, we have not taken into account the overhead of

    protocols used to com municate between peers.

    7 CONCLUSION

    We have proposed information exchange protocols and

    its associated block-selection algorithm to improve per-

    formance of a hybrid network coding P2P system in

    wh ich encoding-enabled and non-encoding peers coexist.

    Our design is simple, backward compatible to Bit-

    Torrent, yet efficient in the w ay it hand les blocks of d ata:coded and non-coded alike.

    We prop osed tw o pr otocols. The first one,pre-code pro-

    tocol, is an extension of BitTorrent w ith the ad dition of an

    encoder-id field in the exchanged messages to identify

    from whom the blocks are generated. The second one,

    namely post-code, by postponing the encoding process,

    can combine and deliver more up dated information to the

    receivers and achieve shorter finish time. Post-code proto-

    col is more effective in severely bottlenecked topologies.

    The trade-off is, however, higher protocol overhead.

    Our block-selection algorithm is derived from observa-

    tion on the benefit of network coding in eliminating data

    duplication. Using our proposed algorithm, peers can

    effectively choose blocks to download which results in

    considerable imp rovement in d istribution time.

    We believe our proposed solution, which promotes

    network coding as a method to shorten distribution time

    even if encoding is not fully enabled at every p eer, will be

    of great use in heterogeneous P2P systems and / or when

    there is a need to m inimize resource consum ption.

    For futu re work, we plan to evaluate the proposed d e-

    sign and block selection algorithm in a dynamic setting.

    We are also interested in imp lementing the prop osal in a

    real system, especially to evaluate the actual trad e-off and

    effectiveness o f the p ost-code p rotocol.In this paper, we have not addressed incentive issues:

    how to motivate peers to encode, which is another inter-

    esting p roblem we leave for futu re work.

    ACKNOWLEDGMENT

    This work was supported by JSPS KAKENHI Grant

    Number (24500098).

    REFERENCES

    [1] R. Ahlswede, N. Cai, S. R. Li, and R. W. Yeung, Network InformationFlow, IEEE Transactions on Information Theory, July 2000.

    [2] T. Ho, R. Koetter, M. Medard, D. Karger, and M. Effros, The Benefits ofCoding over Routing in a Randomized Setting, ISIT, Japan, 2003.

    [3] B. Cohen, Incentives Build Robustness in BitTorrent, P2P EconomicsWorkshop, 2003.

    [4] C. Gkantsidis and P. R. Rodriguez, "Network Coding for Large ScaleContent Distribution", IEEE INFOCOM, March 2005.

    [5] A. Legout, G. Urvoy-Keller, and P. Michiardi, Rarest first and chokealgorithms are enough, ACM SIGCOMM IMC 2006.

    [6] P. Chou, Y. Wu, and K. Jain, Practical network coding, in Proc. An-nual Allerton Conference 2003.

    [7] Watts, Duncan J., Strogatz, Steven H. (June 1998) "Collective dynamicsof 'small-world ' networks", Natu re 393 (6684): 440442.

    [8] M. Langberg, A. Sprintson, and J. Bruck, "The encoding complexity ofnetwork coding", IEEE Trans. on Information Theory, pp. 2386 - 2397,

    June 2006.[9] L.C. Freeman, A set of measures of centrality based on betweenness,

    1977, Sociometry, vol. 40, No. 1, 35-41.

    [10] N Leibowitz, M. Ripeanu, A. Wierzbicki, Deconstructing the kazaanetwork, in proceedings of WIAPP 2003.

    Fig. 16. Finish time improvement of the proposed systemand baseline system using 250 encoder placed at highbetweenness centrality peers compared with non-codingBitTorrent.

    Fig. 17. Finish time improvement of the proposed systemand baseline system using 250 encoder placed at high-degree peers compared with non-coding BitTorrent.

    Fig. 18. Finish time improvement of the proposed systemand baseline system using 250 encoder placed at randompeers compared with non-coding BitTorrent.

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 4, APRIL 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 16

  • 7/30/2019 Hybrid Network Coding Peer-to-Peer Content Distribution

    10/10

    [11] D. Nguyen, H. Nakazato, Peer-to-Peer Content Distribution in Clus-tered Topologies with Source Coding, IEEE GLOBECOM 2011, Dec.

    2011.

    [12] N. Cleju, N. Thomos, P. Frossard, Network Coding Node Placementfor Delay Minimization in Streaming Overlays, IEEE ICC 2010.

    [13] S. Li, R. Yeung, and N. Cai, Linear network coding, IEEE Transac-tions on Information Theory, 2003.

    [14] D. Nguyen, H. Nakazato, Rarest-first and Coding Are N ot Engough,IEEE GLOBECOM 2012, Dec. 2012.

    [15] D. Nguyen, H. Nakazato, Network Coder Placement for Peer-to-PeerContent Distribution, IEICE Tech. Report CS2012-74, Nov. 2012.

    [16] C Gkantsidis, J Miller, P Rodriguez, "Comprehensive view of a livenetwork coding P2P system," ACM SIGCOM IMC '06.

    [17] L.C. Freeman, A set of measures of centrality based on betweenness,1977, Sociometry, vol. 40, No. 1, 35-41.

    [18] L.C. Freeman, S.P. Borgatti, D.R. White, "Centrality in valued graphs: ameasure of betweenness based on network flow", Social Networks 13,

    141154, 1991.

    [19] A Legout, G Urvoy-Keller, P Michiardi, "Understanding BitTorrent: AnExperimental Perspective," Tech. Report, INRIA-00000156, VERSION 3

    - Nov. 2005.

    [20] C. Yin, B. Wang, W. Wang, T. Zhou, and H. Yang, Efficient routing onscale-free networks based on local information, Physics Letters A, Vol.351, Issues 4-5, 6 March 2006, pp.220-224.

    [21] M. Kim, M. Medard , V. Aggarwal, U.-M. O'Reilly, W. Kim, C. W. Ahn,and M. Effros, ``Evolutionary Approaches to Minimizing Network

    Coding Resources,'' Proc. IEEE INFOCOM 2007, May 2007.

    [22] K. Bhattad, N. Ratnakar, R. Koetter, and K. R. Narayanan, ``Minimalnetwork coding for multicast,'' Proc. IEEE ISIT 2005, Sep. 2005.

    Dinh Nguyen received his Bachelor of Electronics and Telecomm.degree from Hanoi University of Technology, Vietnam, in 1999. From1999 he was with NetNam ISP Corporation. He received his MSc in2006 and currently is a Ph.D. candidate at Graduate School of Glob-al Information and Telecommunications Studies, Waseda University,Tokyo, Japan. His research interests include peer-to-peer systems,

    network coding, and content distribution systems.

    Hidenori Nakazato received his B. Engineering degree in electron-ics and telecommunications from Waseda University in 1982 and hisMS and Ph.D. degrees in computer science from University of Illinoisin 1989 and 1993, respectively. He was with Oki Electric from 1982to 2000. Since 2000, he has been a faculty member of GraduateSchool of Global Information and Telecommunications Studies,Waseda University. He served as the editor of IEICE Transactionson Communications from 1999 to 2002 and other positions in theexecutive committee of IEICE Communication Society from 1997 to2004, and from 2008 to now. He also served as an executive com-mittee member of IEEE Region 10 and is serving as a member ofseveral IEEE Member and Geographic Activity committees. Hisresearch interests include performance issues in distributed systems

    and networks. He is a member of ACM, IEEE, and IPSJ.

    JOURNAL OF COMPUTING, VOLUME 5, ISSUE 4, APRIL 2013, ISSN (Online) 2151-9617

    https://sites.google.com/site/journalofcomputing

    WWW.JOURNALOFCOMPUTING.ORG 17