30
HAL Id: hal-01619906 https://hal.inria.fr/hal-01619906 Submitted on 19 Oct 2017 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. An Adaptive Peer-Sampling Protocol for Building Networks of Browsers Brice Nédelec, Julian Tanke, Pascal Molli, Achour Mostefaoui, Davide Frey To cite this version: Brice Nédelec, Julian Tanke, Pascal Molli, Achour Mostefaoui, Davide Frey. An Adaptive Peer- Sampling Protocol for Building Networks of Browsers. World Wide Web, Springer Verlag, In press, 21 (3), pp.629-661. 10.1007/s11280-017-0478-5. hal-01619906

An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

HAL Id: hal-01619906https://hal.inria.fr/hal-01619906

Submitted on 19 Oct 2017

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

An Adaptive Peer-Sampling Protocol for BuildingNetworks of Browsers

Brice Nédelec, Julian Tanke, Pascal Molli, Achour Mostefaoui, Davide Frey

To cite this version:Brice Nédelec, Julian Tanke, Pascal Molli, Achour Mostefaoui, Davide Frey. An Adaptive Peer-Sampling Protocol for Building Networks of Browsers. World Wide Web, Springer Verlag, In press,21 (3), pp.629-661. �10.1007/s11280-017-0478-5�. �hal-01619906�

Page 2: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

An Adaptive Peer-Sampling Protocol

for Building Networks of Browsers

Brice Nedelec1, Julian Tanke1, Pascal Molli1, Achour Mostefaoui1, and Davide Frey2

1L2SN, University of Nantes, 2 rue de la Houssiniere, BP 92208, 44322 Nantes Cedex 3, France,[email protected]

2INRIA Bretagne-Atlantique, Campus Universitaire de Beaulieu , 35042 Rennes Cedex, France,[email protected]

October 19, 2017

Abstract

Peer-sampling protocols constitute a fundamentalmechanism for a number of large-scale distributedapplications. The recent introduction of WebRTCfacilitated the deployment of decentralized applica-tions over a network of browsers. However, deploy-ing existing peer-sampling protocols on top of We-bRTC raises issues about their lack of adaptivenessto sudden bursts of popularity over a network thatdoes not manage addressing or routing. Spray is anovel random peer-sampling protocol that dynami-cally, quickly, and efficiently self-adapts to the net-work size. Our experiments show the flexibility ofSpray and highlight its efficiency improvements atthe cost of small overhead. We embedded Sprayin a real-time decentralized editor running in brow-sers and ran experiments involving up to 600 com-municating web browsers. The results demonstratethat Spray significantly reduces the network trafficaccording to the number of participants and savesbandwidth.

1 Introduction

Peer-sampling protocols [24, 38, 39] constitute a fun-damental mechanism for a number of large-scale dis-

tributed applications both on the Cloud [11] and ina peer-to-peer setting [16, 40, 43]. By providing eachnode with a continuously changing partial view of thenetwork, they make applications resilient to churn [6]and inherently load balancing [15]. In the context ofvideo streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streamingload over all peers without requiring the creationand the maintenance of rigid structures like multi-ple trees [15, 29].

The recent introduction of WebRTC [3] has re-newed the research interest in a variety of appli-cations that require peer-sampling protocols suchas video streaming [34, 35], content-delivery net-works [44], or real-time collaborative editors [32].However, deploying existing peer-sampling protocolson top of WebRTC raises important technical chal-lenges. (1) WebRTC does not manage addressing norrouting; this makes connection establishment muchmore costly than on IP networks and more likely tofail. (2) Browsers run on desktops, laptops and mo-bile phones. This requires protocols that reduce re-source consumption as much as possible. (3) Theability to launch WebRTC sessions through simpleHTTP links exposes applications to sudden bursts ofpopularity. Consider the example of a user who isstreaming a video directly from his mobile phone tosome of his friends. The user suddenly witnesses some

1

Page 3: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

dramatic event, and his friends spread the news bytwitting the stream’s address. Instantly a huge num-ber of users connect and start watching the streamon their laptops and phones. The streaming mecha-nisms and the protocols it relies on must be able toadapt to this sudden burst of popularity, maintainingtheir quality of service, while being able to return totheir initial configuration when the popularity burstsubsides.

Unfortunately, existing peer-sampling protocolslack adaptiveness or reliability in the context of We-bRTC. On the one hand, Scamp [19, 20] providesadaptiveness by maintaining partial views of sizeln(n) + k, n being the number of nodes and k be-ing a constant. However, Scamp is not reliable inthe context of WebRTC, for its connection establish-ment process is based on random walks and cannothandle the connection failures that often occur inWebRTC. On the other hand, the most popular ap-proaches like Cyclon [39] and the whole RPS proto-col family [24] are reliable in the context of WebRTCbut are not adaptive: developers have to configurefixed-size views at deployment time. This forces de-velopers to oversize partial views to handle potentialbursts, either wasting resources or not provisioningfor large-enough settings. A simple solution to ad-dress this problem estimates the actual number ofnetwork nodes by periodically running an aggregationprotocol [31], then resizes the partial views. However,this aggregation protocol would have to run quite fre-quently, resulting in significant network overhead toanticipate a popularity burst that may never happen.Our approach provides adaptiveness and reliabilityby locally self-adjusting the size of partial views atjoin, leave, and shuffle times. At a marginal cost, par-tial view size converges towards ln(n). This not onlymakes our approach adaptive but also outputs thesize of the network to the application level allowingapplications to adapt to sudden burst in popularityat no additional cost. We demonstrate the impact ofour approach on two use cases that use RPS to broad-cast messages. The first one implements a live videostreaming based on three-phase gossip [15, 28, 29].In this case, adaptiveness allows message delivery toremain stable even in presence of burst in popularity.The second one implements a real-time collaborative

editor. In this case, adaptiveness allows the editorsto adjust the traffic to the size of the network. Somedistributed hash table protocols such as D2HT [6] re-quire both the knowledge of the size of the networkto compute the view of each node, and a randompeer-sampling protocol to provide connectivity in thepresence of high churn. Our approach provides bothat the same cost.

In this paper, we address the challenge of a dy-namically adaptive peer-sampling, by introducingSpray, a novel random peer-sampling protocol in-spired by both Scamp [19, 20] and Cyclon [39].Spray improves the state-of-the-art in several ways.(i) It dynamically adapts the neighborhood of eachpeer. Thus, the number of connections scales loga-rithmically with the network size. (ii) It only usesneighbor-to-neighbor interactions to establish con-nections. Thus, a node needs to rely only on a con-stant number of other nodes to establish a connectionregardless of network size. (iii) It quickly convergesto a topology with properties similar to those of ran-dom graphs. Thus, the network becomes robust tomassive failures, it can quickly disseminate messages.(iv) Experiments show that Spray provides adap-tiveness and reliability at a marginal cost. (v) Usecases demonstrate how the estimation of the networksize positively impacts two RPS-based broadcast pro-tocols either on traffic or message delivery.

The rest of this paper is organized as follows: Sec-tion 2 reviews the related work. Section 3 states thescientific problem. Section 4 details the Spray pro-tocol. Section 5 presents experimentation results ofSpray and compares them to state-of-the-art. Sec-tion 6 details our use-cases. We conclude in Section 7.

2 Related work

WebRTC enables real-time peer-to-peer communi-cation between browsers even in complex networksettings with firewalls, proxies or Network AddressTranslators (NAT). However, WebRTC manages nei-ther addressing nor routing. To establish a connec-tion, two browsers need to exchange offers and ac-knowledgments through a common mediator, e.g.,mails, dedicated signaling services [2], or existing We-

2

Page 4: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

signaling service

1 42 3

p1

p2

p3

(a) p1 connects to p2 using the signalingservice. 1: p1 pushes its offer ticket; 2: p2pulls the ticket; 3: p2 pushes its response;4: p1: pulls the response and establishes aconnection with p2. p3 does the same withp2.

signaling service

1→←4

2→←3

p1

p2

p3

(b) p1 connects to p3 using p2 as media-tor. 1: p1 sends its offer ticket to p2; 2:p2 forwards it to p3 and registers p1 as theemitter; 3: p3 sends its response to p2; 4:p2 forwards it to the emitter p1 which con-nects to p3.

signaling service

p1

p2

p3

(c) The resulting network overlay: a fullyconnected network composed of 3 mem-bers.

Figure 1: Creating an overlay network on top of WebRTC.

bRTC connections [1]. Figure 1a describes the veryfirst connection of one peer p1 with another p2 usinga signaling server. Figure 1b shows instead that p3can later use p2 as mediator instead of the signal-ing service. Figure 1c shows the resulting network.Note that if p2 crashes or leaves during the forward-ing process, the connection establishment will fail,even if alternative routes exist as WebRTC does notmanage routing.

Using signaling services and existing WebRTCconnections makes it easy to deploy random peer-sampling protocols [21] in browsers that can run onmobile phones or tablets connected to mobile net-works [10]. However, the complexity of the WebRTCconnection process further requires nodes to estab-lish as few connections as possible in order to reducethe generated traffic and limit resource consumption.Unfortunately existing protocols fail at this task.

Random peer-sampling protocols [21, 24] produceoverlay networks by populating the partial viewswith references to peers chosen at random amongthe network members following a uniform distribu-

tion. Solely relying on local knowledge, they con-verge to a topology exposing properties similar tothose of random graphs [8, 13]. Among others, theyefficiently provide connectedness and robustness. Awide variety of gossip-based protocols use randompeer-sampling at their core. For instance, topologyoptimization protocols [23, 41] aim at improving lo-calization, latency, or at addressing user preferences.

The representatives of random peer-sampling pro-tocols using a fixed-size partial view comprise Lp-bcast [14], HyParView [27] Newscast [38], and Cy-clon [39]. The developers deploying applications us-ing such protocols must foresee the dimensions of thenetworks to setup properly their partial views size.These decisions cannot be easily retracted afterwardsat runtime. As a consequence, applications tend tooverestimate the required view sizes.

For example, a Cyclon-based application maymaintain 7 connections in the browser despite requir-ing only 4. While this causes little harm in standardIP-based networks. Maintaining too many active We-bRTC connections may uselessly overload a device.

3

Page 5: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

This calls for a dynamic peer-sampling service thatcan adapt to a dynamic number of participants.

A way to achieve this adaptive behavior may con-sist in using one of the several protocols for network-size estimation. For example, [18] samples and ana-lyzes a subset of the network and deduces the overallnetwork size using probabilistic functions. Sketch-ing techniques [5] use instead hashing to representa large number of distinct node identities and esti-mate the size of the network thanks to the result-ing collisions in hash values. Finally, averaging tech-niques [22] use aggregations that converge over ex-changes to a value which depends on the networksize. Unfortunately, while they can be accurate intheir estimations, these approaches add a communi-cation overhead which makes them too expensive formobile phones and similar devices.

The sole representative of adaptive-by-design ran-dom peer-sampling is Scamp [19, 20]. Its interestingproperty lies in its partial view sizes that grow log-arithmically with the size of the network. However,Scamp suffers from other drawbacks. In particular,Scamp uses random walks to establish new connec-tions between peers. In WebRTC, each hop in theserandom paths must be traveled back and forth tofinalize the connection establishment (see Figure 1).This drastically increases the probability that Scampwill fail in establishing connections. In the presenceof churn or message loss, the number of connectionsdecreases quickly, eventually leading to network par-titions (see Section 5.8).

3 Problem statement

We consider a set V of peers labeled from p1 to pN .The peers can crash or leave at any time without giv-ing notice, and they can recover or re-enter the net-work freely. We consider only non-byzantine peers.

A multiset of arcs E = (pf , pt) ∈ V × V allows pfto communicate with pt but not the converse. Themultiset of arcs is dynamic, i.e., arcs appear and dis-appear over time. By N t ∈ V × E, we denote thestate of the network overlay at time t.

The multiset of arcs starting from pf is called thepartial view of pf , noted Pf . The destination peers

in this view are the neighbors of pf . A partial viewcan contain arcs that are stale when neighbors haveleft or crashed.

The peers communicate by means of messages.They only send messages to their neighbors. A de-parted peer cannot send messages. An arc may failto deliver a message. A stale arc systematically fails.

Building an adaptive-by-design random peer-sampling that meets WebRTC constraints raises thefollowing scientific problem:

Problem Statement 1 Let t be an arbitrary timeframe, let V t be the network membership at that giventime t and let Pt

x be the partial view of peer px ∈Vt. An adaptive and reliable random peer-samplingshould provide the following properties:

1. Partial view size: ∀px ∈ V t, |Ptx| = O(ln |V t|)

2. Connection establishment by a node: O(1)

The first condition states that the partial view sizeis relative to the size of the network at any time. Italso states that partial views should grow and shrink,at worst, logarithmically compared to the size of thenetwork. This is a condition for adaptiveness andnot a requirement for network connectivity. The sec-ond condition states that a node should be able toestablish a connection by relying only on a constantnumber of intermediary peers, independently of net-work size. Lpbcast, HyParView, Newscast, and Cy-clon fail to meet the first condition of the problemstatement since they do not adapt their views to thenetwork size. Scamp fails to meet the second condi-tion of the problem statement since each connectionimplies an unsafe random dissemination. The nextsession introduces Spray, a protocol that meets bothof these constraints without any global knowledge.

4 Spray

Spray is an adaptive-by-design random peer-sampling protocol inspired by Scamp [19, 20] andCyclon [39]. It comprises three parts representingthe lifecycle of a peer in the network. First, the join-ing process injects a logarithmically growing numberof arcs into the network. Hence, the number of arcs

4

Page 6: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

scales with the network size. Next, each peer runs aperiodic process in order to balance the partial viewsboth in terms of partial view size and uniformity ofthe referenced peers within them. Quickly, the over-lay network converges to a topology exposing proper-ties close to those of random graphs. Finally, a peeris able to leave at any time without giving notice(equivalent to a crash) while the network propertiesdo not degrade.

The key to adaptiveness consists in keeping a con-sistent global number of arcs during the whole pro-tocol. Indeed, unlike Cyclon, Spray is always onthe edge of a logarithmic number of arcs comparedto the network size. Since Spray nodes never createadditional arcs after joining, any removal is a defini-tive loss. Thus, firstly, Spray’s joining adds somearcs to the network. Secondly, Spray’s shuffling pre-serves all arcs in the network. Thirdly, Spray’s leav-ing cautiously removes some arcs, ideally the numberof arcs introduced by the last joining peer.

Occasionally, keeping the global number of arcsconstant forces the shuffling and the leaving processesto create duplicates in partial views. Thus, a partialview may contain multiple occurrences of a particularneighbor. In this paper, we show that the number ofduplicates remains low and does not impact networkconnectivity.

4.1 Joining

Spray’s joining algorithm is the only part of the pro-tocol where the global number of arcs in the networkcan increase. To meet the first requirement of theproblem statement, this number of arcs must growlogarithmically compared to the network size. There-fore, each peer assumes that it has such a logarith-mically growing view to propagate the identity of thejoining peer. Algorithm 1 describes Spray’s join-ing protocol. Line 6 shows that the contacted peeronly multicasts the new identity to its neighborhood.Afterwards, to limit the risk of connection failures,each neighbor immediately adds the joining peer toits own neighborhood. This fulfills the second condi-tion of the problem statement. In total, the numberof arcs in the network increases of 1 + ln(|V t|) usingonly neighbor-to-neighbor interactions, which leads

to a total number of |V t| ln(|V t|) arcs [19].

Algorithm 1 The joining protocol of Spray runningat Peer p.

1: INITIALLY:2: P ← ∅; B the partial view is a multiset3:

4: EVENTS:5: function onSubs(o) B o : origin6: for each 〈q, 〉 ∈ P dosendTo(q, ′fwdSubs′, o);

7: function onFwdSubs(o) B o : origin8: P ← P ] {〈o, 0〉};

The partial view is a multiset of pairs 〈n, age〉which associates each neighbor, n, with an age,age. The multiset allows nodes to manage dupli-cates, while age plays the same role as in Cycloni.e. it accelerates the removal of crashed/departedpeers by shuffling with the oldest neighbors first. TheonSubs event is called each time a peer joins the net-work. onSubs forwards the identity of the joiningpeer to all neighbors, independently of the age. TheonFwdSubs event is called when a peer receives suchforwarded subscription. It adds the peer as one of itsneighbor with an age set to 0 meaning that it is abrand new reference.

Figure 2 depicts a joining scenario. Peer p1 con-tacts p2 to join the network composed of {p2, p3,p4, p5, p6}. For simplicity, the figure shows only thenew arcs and the neighborhoods of p1 and p2. Peerp1 directly adds p2 in its partial view. Peer p2 for-wards the identity of p1 to its neighborhood. Each ofthese neighbors adds p1 in their partial view. In to-tal, Spray establishes 5 connections and the networkis connected.

Unfortunately, the partial views of the newest peersare clearly unbalanced and violate the first conditionof our problem statement. The periodic protocol de-scribed in the next section will re-balance the partialviews.

5

Page 7: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

(a)p1 p2

p6

p5

p4

p3

(a) p1 contacts p2 to join the network. p1adds p2 to its neighborhood. p1 sends itsrequest to p2.

(b)

(b)

p1 p2

p6

p5

p4

p3

(b) The onSubs(p1) event is raised atp2 which forwards the subscription to itsneighbors.

(c)

(c)

p1 p2

p6

p5

p4

p3

(c) The onFwdSubs(p1) event is raised atp3−6. The peers add p1 to their neighbor-hood.

Figure 2: Example of the joining protocol of Spray.

6

Page 8: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

4.2 Shuffling

Unlike Cyclon, Spray shuffles partial views of dif-ferent sizes. The shuffling balances the partial viewsizes and randomly mixes the neighborhoods betweenpeers. Nevertheless, the global number of arcs in thenetwork remains unchanged.

In Spray’s shuffling protocol, each of the involvedpeers sends half of its partial view to the other. Afterintegration, their view sizes both tend to the averageof their initial sizes. The sum of their elements re-mains unchanged. In order to keep the arc numberinvariant, the partial views of Spray are multisets.If a peer receives an already known reference, it stillstores it, yet as a duplicate. Thus, the Spray’s shuf-fling protocol never increases nor decreases the arccount.

If duplicates have a negative impact on networkproperties, most of them disappear after shuffling andthey proportionally become negligible as the networkgrows.

Algorithm 2 shows the Spray protocol running ateach peer. It is divided between an active threadlooping to update the partial view, and a passivethread which reacts to messages. The functionswhich are not explicitly defined are the following:

• incrementAge(view): increments the age ofeach element in the view and returns the modi-fied view.• getOldest(view): retrieves the oldest peer

contained in the view.• getSample(view, size): returns a sample of

the view containing size elements.• replace(view, old, new): replaces all the oc-

currences of the old element in the view by thenew element and returns the modified view.• rand(): generates a random floating-point num-

ber between 0 and 1.

In the active thread, Function loop is called every∆t time units. First, the function increments the ageof each neighbor in P. Then, it chooses the oldestpeer q to exchange a subset of its partial view. Peerp selects a sample of its partial view, excluding oneoccurrence of q and including itself. The size of thissample is half of its partial view, with at least onepeer: the initiating peer (see Line 5). It sends this

subset to Peer q and awaits its sample. If Peer q doesnot send its sample in time (it may retry to tacklewith message losses), Peer p considers q as crashedor departed and executes the corresponding function(see Section 4.3). The answer of q contains half of itspartial view. Since peers can appear multiple times inP, the exchanging peers may send references to theother peer, e.g., Peer p’s sample can contain refer-ences to q. Such sample, without further processing,would create self-loop (q’s partial view contains ref-erences to q). To alleviate this undesirable behavior,all occurrences of the other peer are replaced withthe emitting peer (see Line 6, 18). Afterwards, bothof them remove the sample they sent from their viewand add the received sample. It is worth noting thatPeer p removes at least 1 occurrence of q and Peerq adds at least 1 occurence of p, for the sake of con-nectedness.

Figure 3 depicts Spray’s shuffling procedure. Thisscenario follows from Figure 2: Peer p1 just joined thenetwork. Peer p6 initiates an exchange with p1 (theoldest among p6’s partial view). It randomly choosesd|P6| ÷ 2− 1e = 1 peer among its neighborhood. Inthis case, it picks p9 from {p7, p8, p9}. It sends thechosen peer plus its own identity to Peer p1. In re-sponse, the latter picks d|P1| ÷ 2e = 1 peer from itspartial view. It sends back its sole neighbor p2 anddirectly adds the received neighbor to its partial view.After receipt, Peer p6 removes the sent neighbor fromits partial view, removes an occurrence of p1, andadds the received peer from p1. Peers {p6, p9} com-pose p1’s partial view. Peers {p2, p7, p8} composethat of p6.

The example shows that, at first, the initiating peerhas 4 peers in its partial view, while the receiving peerhas only 1 peer. After the exchange, the former has3 neighbors including 1 new peer. The receiving peerhas 2 neighbors, and both of them are new. Thus, theperiodic procedure tends to even out the partial viewsizes of network members. It also scatters neighborsin order to remove the highly clustered groups whichmay appear because of the joining protocol.

Concerning the convergence time of the shufflingalgorithm, there exists a close relationship betweenSpray and the proactive aggregation protocol intro-duced in [22]. Given a distribution of values associ-

7

Page 9: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

p1 p2

p6

p5

p4

p3

p7p8p9

(a) Peer p6 initiates the exchange withp1 by sending to the latter the multiset{p6, p9}.

p1 p2

p6

p5

p4

p3

p7p8p9

(b) Peer p1 receives p6’s message. It sendsback the multiset {p2} and adds {p6, p9}to its partial view.

p1 p2

p6

p5

p4

p3

p7p8p9

(c) Peer p6 receives p1’s response, it adds{p2} to its partial view.

Figure 3: Example of the shuffling protocol of Spray.

8

Page 10: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

Algorithm 2 The cyclic protocol of Spray running at Peer p.

1: ACTIVE THREAD:2: function loop( ) B Every ∆ t3: P ← incrementAge(P);4: let 〈q, age〉 ← getOldest(P);5: let sample← getSample(P \ {〈q, age〉} , d|P| ÷ 2e − 1) ] {〈p, 0〉};6: sample← replace(sample, q, p);7: let sample′ ← sendTo(q, ′exchange′, sample));8: while (¬timeout(sample′) ∧ ¬sample′) do waitForQ; B Awaiting q’s sample9: if ¬timeout(sample′) then

10: sample′ ← replace(sample′, p, q);11: P ← (P \ sample) ] sample′;12: else13: onPeerDown(q); B see Algorithm 3

14:

15: PASSIVE THREAD:16: function onExchange(o, sample) B o : origin17: let sample′ ← getSample(P, d|P| ÷ 2e);18: sample′ ← replace(sample′, o, p);19: sendTo(o, sample′);20: sample′ ← replace(sample′, p, o);21: P ← (P \ sample′) ] sample;

ated with peers, and under the assumption of a suf-ficiently random peer-sampling, such an aggregationprotocol yields the following mean µ, and varianceσ2, at cycle i;

µi = 1|V t|

∑x∈V t

ai, x σ2i = 1

|V t|−1∑

x∈V t

(ai, x − µi)2

where ai, x is the value held by Peer px at cycle i. Theestimated variance must converge to 0 over cycles.In other terms, the values tend to be the same overcycles. In the Spray case, the value ai, x is the partialview size of Peer px at cycle i. Indeed, each exchangefrom Peer p1 to Peer p2 is an aggregation resulting to:|P1| ≈ |P2| ≈ (|P1|+ |P2|)÷ 2. Furthermore, at eachcycle, each peer is involved in the exchange protocolat least once (they initiate one), and in the best case1 + Poisson(1) (they initiate one and, on average,each peer receives another one). This relation beingestablished, we know that Spray’s partial view sizesconverge exponentially fast to the global average size.Additionally, we know that each cycle decreases their

variance in the overall system at a rate comprisedbetween 1÷ 2 and 1÷ (2

√e).

The shuffling algorithm provides adaptiveness atthe cost of duplicates. Averaging the partial viewsizes over exchanges quickly converges to a networktopology where the partial views are balanced.

4.3 Leaving or crashing

In Spray, peers can leave the network without no-tice. As such, we make no distinction between nodedepartures and crashes. Peers that crash or leave thenetwork do not save information about the network.Consequently, if they join the network afterwards,they are considered as completely new peers.

Spray must react properly to departures andcrashes. Without reaction, the network would col-lapse due to an over zealous removal of arcs. In-deed, a peer joining the network injects 1 + ln(|V t|)arcs. Nevertheless, after few exchanges, its partialview becomes populated with more neighbors. Then,

9

Page 11: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

if this peer leaves, it removes ln(|V t|) arcs from itspartial view, and another ln(|V t|) arcs from peerswhich have this peer in their partial views. There-fore, without any crash handler, we remove 2 ln(|V t|)connections instead of 1+ln(|V t|). To solve this, eachpeer that detects a crash may reestablish a connectionwith anyone in its neighborhood. The peer reestab-lishes a connection with probability 1−1÷ |P|. Since|P| ≈ ln(|V t|) peers have the crashed peer in theirpartial view, it is likely that all of them will reestab-lish a connection, except one. Therefore, when a peerleaves, it approximately removes the number of con-nections injected by the most recently joined node.Duplicates created by this procedure will disappearover time as peers shuffle their partial view.

Algorithm 3 The crash/departure handler ofSpray running at Peer p.

1: function onPeerDown(q) B q: crashed/departedpeer

2: let occ← 0;3: for each 〈n, age〉 ∈ P do B remove and count4: if (n = q) then5: P ← P \ {〈n, age〉};6: occ← occ+ 1;

7: for i← 0 to occ do B probabilistically duplicates8: if (rand() > 1÷ (|P|+ occ)) then9: let 〈n, 〉 ← P[brand() ∗ |P|c];

10: P ← P ] {〈n, 0〉};

11: function onArcDown(q, age) B q: arrival of thearc down

12: P ← P \ {〈q, age〉};13: let 〈n, 〉 ← P[brand() ∗ |P|c];14: P ← P ] {〈n, 0〉}; B systematically duplicates

Algorithm 3 shows the manner in which Spraydeals with departures and crashes. Function on-PeerDown shows the reaction of Spray when peerq is detected as crashed or departed. A first loopcounts the occurrences of this neighbor in the par-tial view, and removes all of them. Then, the sec-ond loop probabilistically duplicates the reference ofknown peers. The probability depends on the partialview size before the removals.

Figure 4 depicts Spray’s crash/leaving handler.

The scenario follows from prior examples after fewother exchanges. Peer p1 leaves the network withoutgiving notice. With it, 7 connections are down. Peersp3, p4, and p5 have the crashed/left peer in their par-tial view. Peer p5 has 1 − 1÷ |P5| = 2÷ 3 chanceto replace the dead connections. In this case, it du-plicates the connection to p13. Identically, p3 and p4detect the crash/leaving and run the appropriate op-eration. Only p3 duplicates one of its connections. Intotal, 5 connections have been removed.

The example shows that some peers reestablishconnections if they detect dead ones. The probabil-ity depends on the partial view size of each of thesepeers. On average, one of these peers will likely re-move the arc while the other peers will duplicate oneof their existing arcs. In this case, Peer p1 injected5 connections when it joined. It removes 7 − 2 = 5connections when it leaves. The global number ofconnections remains logarithmic with respect to thenumber peers in the network. Nevertheless, we ob-serve that connectedness is not entirely guaranteed– only with the high probability implied by randomgraphs. Indeed, if Peer p1 is the sole bridge betweentwo clusters, adding arcs is not enough to ensure con-nectedness.

Algorithm 3 also shows that Spray distinguishespeer crashes and arc crashes. Indeed, FunctiononArcDown deals with connection establishmentfailures. In this function, the failing arc is system-atically replaced with a duplicate. Therefore, the arccount stays invariant even in the presence of connec-tion establishment failures. The distinction betweenthe functions onPeerDown and onArcDown isnecessary because the former is supposed to remove asmall arc quantity over departures, contrarily to thelatter. Without this small removal, the global arccount would grow unbounded with network turnover.

In the context of WebRTC, Spray calls the onAr-cDown function when a connection establishmentfails. It may happen due to network misconfigurationor to message losses. Spray calls the onPeerDownfunction when the connection was established oncebut the neighbor is no longer responding.

10

Page 12: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

×

××

××× ×p1

p5

p4

p3

p6p7p8p9 p14

p13

p12

p11

p10

(a) Peer p1 crashes. A lot of connectionsare down.

×

××

p1

p5

p4

p3

p6p7p8p9 p14

p13

p12

p11

p10

(b) The peers p3−5 notice that they cannotreach p1 anymore.

p1

p5

p4

p3

p6p7p8p9 p14

p13

p12

p11

p10

(c) The peers p3 and p5 create a duplicateof one of their existing neighbor.

Figure 4: Example of the crash/leaving handler of Spray.

5 Experimentation

In this section, we evaluate how the adaptivenessof Spray impacts common metrics of peer-samplingperformance including clustering coefficient, averageshortest path length, in-degree distribution, robust-ness, and arc count. We compare Spray with arepresentative of fixed-size partial view approaches,namely Cyclon. We expect Spray and Cyclon toexhibit similar behaviors when Cyclon is configuredin advance to the logarithm of the targeted networksize. We expect Spray to save resources when Cy-clon provides larger views, and to be more robustwhen Cyclon provides smaller views. We expectSpray-based network size estimators to provide anestimate at least accurate enough to inform about theorder of magnitude of the actual network size. Fi-nally, we expect Spray to keep a negligible numberof duplicates in its partial views. We also evaluate theimpact of the WebRTC connection establishment onCyclon, Scamp, and Spray, in networks subjectto message loss. We expect a normal behavior for

Cyclon and Spray while Scamp becomes quicklypartitioned.

The experiments run on the PeerSim simula-tor [30], a well-known program written in Java thatallows simulations to reach high scale in terms ofnumber of peers. Our implementation of the evalu-ated random peer-sampling protocols is available onthe Github platform1.

5.1 Clustering and convergence time

Objective: To observe how adaptiveness impactsclustering and convergence time.

Description: The average local clustering coeffi-cient [42] measures peers’ neighborhood connectivityin the network:

C =1

|V t|∑x∈V t

Cx

where Cx is the local clustering coefficient of Peerpx. The higher the coefficient, the more peers are

1https://github.com/justayak/peersim-spray

11

Page 13: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

10-5

10-4

10-3

10-2

10-1

100

5 10 15 20 25 30

Cyclon

clu

ste

rin

g c

oe

ffic

ien

t (lo

g1

0)

cycles

5 10 15 20 25 30

Spray

cycles

0.1k peers1k peers

10k peers100k peers

Figure 5: Clustering coefficient.

tied together. The coefficient of complete graphs is1. The coefficient of random graphs is close to 0.For each approach, the experiment comprises 4 runsrespectively building a network of 0.1k, 1k, 10k, and100k peers. Peers join the network at once. Peersjoin the network through a contact peer chosen atrandom following a uniform distribution.For the sake of fairness, we configure the representa-tive of fixed-size approaches, Cyclon, to provide 7neighbors which is close to the number of neighborsprovided by Spray when the network size reaches1k peers (ln(1000) ≈ 7). Compared to Spray, Cy-clon is oversized for 0.1k peers and undersized for10k peers and 100k peers. During exchanges, thepeers using Cyclon shuffle 3 out of their 7 neigh-bors.Results: Figure 5 shows that Cyclon starts with

a lower clustering coefficient than Spray. Yet, Cy-clon and Spray have roughly similar convergencetime, for it converges exponentially fast. Figure 5also shows that both approaches converge to a lowclustering coefficient which is characteristic of ran-dom graphs. Nevertheless, Cyclon and Spray donot reach the same values after convergence. Exceptwhen the network comprises 1k peers, Spray’s val-ues are either below when Cyclon is comparativelyoversized; or above when Cyclon is comparativelyundersized.These results show that both Spray and Cyclon

can be deployed at once. They start ready-to-usebut do not have their optimal properties. However,no matter the size of the deployment, they will obtainthese properties very quickly.

Reasons: Peers join one after the other in 1round. Peers join using a contact peer chosen atrandom among already connected peers. Therefore,old peers are chosen more often during this start-ing round. In the starting topology, old peers aremore clustered than newest ones. In particular, sinceSpray does not enforce a limit on the size of par-tial views, oldest peers have larger partial views thannewest peers. Thus, oldest peers are more tied to-gether. Consequently, Spray starts with a higherclustering coefficient than Cyclon.The clustering coefficient measures how much peer’sneighbors are connected together. It directly dependson the partial view size of each peer which, in Cy-clon, is constant. Thus, when the number of peers ismultiplied by 10, the clustering coefficient after con-vergence is divided by 10. On the other hand, peersusing Spray have a partial view the size of whichreflects the network size. When the network has 1kpeers, Spray and Cyclon have roughly the sameaverage partial view size (Spray 7.1 vs Cyclon 7),hence almost identical clustering coefficient measure-ments. By extending the reasoning, this also explainswhy Spray yields lower C values when Cyclon iscomparatively oversized, and why it yields higher val-ues when Cyclon is comparatively undersized.

5.2 Information dissemination

Objective: To observe how adaptiveness impactsthe average shortest path length, i.e., the speed ofinformation dissemination.

Description: The average path length is the av-erage of the shortest path length between peers inthe graph. It counts the minimum number of hopsto reach a peer from another given peer. It basicallyrepresents the traveling time of any information toreach all the peers at least once. We average thepath length on a subset of the network membership.We run the simulation on Spray 100 times to avoidany side effects due to randomness. The experimentcomprises 3 runs for Cyclon: we set Cyclon to

12

Page 14: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

1

2

3

4

5

6

0.1k 1k 10k 100k

Ave

rag

e s

ho

rte

st

pa

th le

ng

th

number of peers (log10)

Cyclon |P|= 7Cyclon |P|= 9

Cyclon |P|=11Spray

Figure 6: Average shortest path length.

provide 7 neighbors which is close to the numberof neighbors provided by Spray when the networkreaches 1k peers; we set Cyclon to provide 9 neigh-bors which is close to the number of neighbors pro-vided by Spray when the network reaches 8k peers;we set Cyclon to provide 11 neighbors which is closeto the number of neighbors provided by Spray whenthe network reaches 60k peers. We perform the mea-surements after convergence. The checkpoints for themeasurements are 0.1k, 0.5k, 1k, 5k, 10k, 50k, and100k peers.Results: Figure 6 shows that both Cyclon and

Spray have a relatively small average shortest pathlength. Figure 6 also shows that each run of Cyclonis divided in three parts compared to Spray. Cy-clon starts with smaller values than Spray. Sprayand Cyclon become equivalent when the latter isconfigured to the logarithm of the actual networksize. After this point, Spray has smaller values thanCyclon. Overall, Spray scales better than Cyclonsince the gradient (slope) of the former is lower thanany configuration of the latter. Empirically, we ob-serve that Cyclon builds networks with an averageshortest path length of ln(|V t|)

ln(|P |) where |P | is constant,while Spray builds networks with an average short-est path length of ln(|V t|)

ln(ln(|V t|)) .These results show that developers could use bothSpray and Cyclon to build efficient informationdissemination protocols. Nevertheless, they should

use Spray over Cyclon if their concerns are aboutlatency and the network size is unknown or canchange over time.

Reasons: We performed the measurements afterconvergence. After convergence, the network overlaybecome closely related to random graphs. In partic-ular, the diameter and average shortest path lengthremain small.Before reaching the point where the number of neigh-bors provided by Spray is equivalent to the numberof neighbors provided by Cyclon, Cyclon providesmore neighbors than Spray. Since the number ofarcs is greater, it yields a lower average path length.Conversely, when the number of peers in the net-work goes over the point where Spray and Cyclonare equivalent, Spray provides more neighbors thanCyclon. Therefore, the lower average path lengthis smaller using Spray. Overall, Spray scales betterthan any configuration of Cyclon, for it adds arcsas the network grows.

5.3 Load-balancing

Objective: To observe how adaptiveness impactsthe in-degree distribution, i.e., the load-balancingamong peers.

Description: The in-degree of a peer shows howwell it is represented in others’ partial views. Thein-degree distribution of the network can highlightthe existence of weakly connected peers and stronglyconnected hubs. These peers have an in-degree farfrom the average in-degree value. It directly impactsrobustness, for few crashes or departures may discon-nect a weakly connected peer; and disconnections ofhubs deeply impact the rest of the network.For each approach, the experiment comprises 4 runsrespectively building a network of 0.1k, 1k, 100k, and500k peers. We configure Cyclon with partial viewsof size 7 which is close to the number of neighbors pro-vided by Spray when the network reaches 1k peers.We perform the in-degree measurements after con-vergence.

Results: Figure 7 shows the in-degree distributionof Cyclon and Spray. The x-axis denotes the in-degree. The y-axis denotes the percentage of peerswith such in-degree in the network. The top part of

13

Page 15: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

0

5

10

15

20

25

30

35

40Cyclon

% o

f p

ee

rs

0.1k peers1k peers

100k peers500k peers

0

5

10

15

20

25

30

35

40

2 4 6 8 10 12 14 16 18 20

Spray

in-degree

Figure 7: In-degree distribution.

Figure 7 focuses on Cyclon. We observe that thein-degree distributions are identical independently ofthe network size. For instance, the distribution of0.1k peers is identical to the one of 500k peers. Themean value is roughly 7 and we observe a strong peakon this value. On the other hand, the bottom part ofFigure 7 focuses on Spray. We observe that the dis-tribution of Spray follows the average partial viewsize, which itself follows the growth of the size of thenetwork.Figure 7 also shows that peers are very gatheredaround the mean partial view size for both Sprayand Cyclon. For instance, for the run with 500kpeers using Spray, the mean in-degree value is 13.37and 88% of peers have an in-degree between 12 and14 included. Even the highest in-degree value 18 isnot far from the average in-degree. Thus, there are noweakly connected peers nor strongly connected hubs.These results show that both Spray and Cyclonare resilient to random failures, for peers are equallyimportant in terms of connectedness. These resultsalso show that developers could use both Spray andCyclon to build dissemination protocols that wouldbalance the download among peers. Using Spray,the download would increase and decrease followingthe fluctuations in network size.Reasons: Once configured, Cyclon must handle

any number of peers in the network with a fixed-size partial view and since the partial view size is

constant, the in-degree of peers stays stable. On theother hand, in Spray, each joining peer brings anincreasing number of arcs in the network. Thus, thein-degree of peers slowly grows reflecting the networksize. Hence, the distribution in the bottom part ofFigure 7 shifts slowly to higher in-degree values asthe network size grows.Spray does not peak on a particular value as Cy-clon does because the average partial view size fora particular network size may fall in-between integervalues. For instance, if peers using Spray have par-tial views that contain 6.5 neighbors on average, itmeans that half these peers have 6 neighbors whilethe other half have 7 (out-degree). As consequence,the average in-degree value will also be 6.5. How-ever, while Spray and Cyclon constrain the size ofpartial views respectively to the average and to a con-stant value, they do not constrain the size of in-views,hence the distributions of Figure 7.

5.4 Adaptive partial views

Objective: To show that using Spray, the partialviews grow and shrink logarithmically compared tothe network size.

Description: In this experiment we focus on dy-namic networks where peers can join and leave. Inthis experiment, we create 4 networks that vary from50 to 100 peers, from 500 to 1k peers, from 5k to 10kpeers, and 50k to 100k peers. First, we create a net-work of half the maximum targeted size. Second, werepeatedly inject then disconnect half the maximumtargeted size in periods of 240 cycles including a 40cycles break in between injection and disconnectionphases. For instance, in the network the size of whichvaries from 50k to 100k, 500 peers join the network ateach cycle during the injection phases until it reaches100k peers; then 40 cycles last; then 500 peers leavethe network at each cycle during the disconnectionphases until it reaches 50k peers, and so on. Duringthe experiments, we measure the maximum, the min-imum, the average, and the standard deviation of thesize of partial views.

Results: Figure 8a focuses on the first 680 roundsof the experiment about the network varying from 5kto 10k peers. The top part of the figure shows the

14

Page 16: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

0

2

4

6

8

10

nu

mb

er

of

pe

ers

(x1

03)

0

2

4

6

8

10

12

14

0 100 200 300 400 500 600

pa

rtia

l vie

w s

ize

cycles

maxstandard deviation

min

(a) Mininum, maximum, average, and standard deviation ofthe size of partial views.

0

2

4

6

8

10

12

0 5000 10000 15000 20000 25000

50k-100k peers

5k-10k peers

500-1k peers

50-100 peers

ave

rag

e p

art

ial vie

w s

ize

cycles

(b) Average size of partial views over long period of time.

Figure 8: Partial view size measurements in networks subject to churn.

15

Page 17: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

number of peer over time. The bottom part of the fig-ure shows the measurements made on partial views.We observe that (i) partial views grow logarithmicallywhen the network size increases; (ii) partial viewsshrink logarithmically when the network size shrinks;(iii) while the extreme values of partial view size arespread during joining phases (from 1 to 12), standarddeviation remains small, meaning that most of peershave the same partial view size. Figure 8b shows theresults of the 4 runs over 28k cycles, i.e., 100 injec-tion/removal phases. The x-axis denotes the time incycles. The y-axis denotes the average partial viewsize of peers. We observe that (i) the average partialview size does not necessarily fit exactly the theo-retical expectation; (ii) the average partial view sizevaries even for a network reaching the same numberof peers; (iii) variations are more important as thenumber of peers is lower.These results show that developers could use Sprayto deploy protocols such as distributed hash tables [9]or routing protocols [26] that could benefit from ran-dom peer-sampling protocols properties and that nor-mally require the use of network-size estimators towork.Reasons: A peer joining the network injects at

least one arc plus a number of arcs depending onthe contact peer. Since we choose the contact peerat random, and the partial views are averaged overshuffles, this number of injected arcs is the averagepartial view size on average. It leads to a logarithmicprogression of the number of arcs. However, the peerjoining the network only have one peer in its partialview while adding more arcs to its contact’s neigh-borhood. Since we choose the contact peer at ran-dom, it may be chosen multiple times during 1 cyclewhich increases quickly its neighbors’ number of arcsabove the average. Consequently, we observe a largedifference between the minimum and the maximum,but the standard deviation remains low. The latter iseven more lowered after shuffling, for it makes partialviews converge to their average size.When a peer leaves the network, there is a logarith-mic number of peers that detect the departure. Theyprobabilistically duplicate an arc in such manner thatone of them do not. Overall, it removes a logarith-mic number of arcs plus one which correspond to

the number of arcs injected during the latest join-ing. Consequently, the partial views shrink logarith-mically when the network size diminishes.

If we choose multiple time a same contact peer, itsneighbors’ number of arcs increases above the aver-age. If one of these neighbors is chosen as contactpeer before any shufflings, the global number of arcsincreases too quickly. On the other hand, we maychoose the newest peers as contact node hence in-creasing the number of arc too slowly, for they havefewer neighbors. Depending on these choices, the av-erage partial view size can grow too high or too lowcompared to the theoretical value. Yet, it remainslogarithmic, for these choices are made at randomfollowing a uniform distribution. These choices aremore influential when the network size is low, for anyaddition or removal of arcs is proportionally moreimportant.

5.5 Network-size estimation

Objective: To show the accuracy of Spray’s par-tial views when used as a network-size estimator.

Description: To evaluate the accuracy of Sprayas a network-size estimator, we define estimationquality as the ratio of measured estimations to the ac-tual network size. This allows us to see how far fromthe actual network size the estimation can be. In ourexperiment, peers use Spray and join the networkby groups of 1k peers until the network reaches a sizeof 100k peers. We evaluate two estimators based onthe partial views of Spray. The first estimator usesthe local partial view only. Spray provides a peer xwith a view of size |P t

x| ≈ ln(|V t|). To estimate thenetwork size, peer x processes:

V t = exp (|P tx|)

The second estimator uses the local partial view andasks for the neighbors’ partial view size. Peer x av-erages the values and computes the estimate:

V t = exp(|P t

x|+∑

i∈P tx|P t

i ||P t

x|+ 1)

16

Page 18: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

0

0.2

0.4

0.6

0.8

1

1.2

1.4

local only

estim

atio

n q

ua

lity

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 20000 40000 60000 80000 100000

using neighborhoods

number of peers

standard deviation

Figure 9: Quality of network-size estimation.

Results: Figure 9 shows mean and standard de-viation of the estimation quality with increasing net-work sizes. We observe that the estimation qualitydepends on the network size. Using local knowledgeonly, most peers are able to compute an estimationranging from ±10% to ±30% of the actual networksize. We also observe that increasing the availableknowledge using peers’ neighborhoods improves theestimation by decreasing the variation of estimatesaround the mean (from ±5% to ±10%).These results show empirically that applications thatneed to estimate the network size can use Spray’spartial view size as input to know the order of mag-nitude of the actual network size. The use of localknowledge only leads to an inaccurate estimate butit comes at no cost while full-fledged network-size es-timators generate traffic [31]. Applications can easilyincrease accuracy on-demand by asking their neigh-bors for their partial view sizes for a limited cost.Reasons: The standard deviation of the estima-

tion quality grows and shrinks while the network sizeincreases, particularly in the case of local-only knowl-edge. Figure 9 shows that the variance of estimatesnarrows before quickly raising at increasingly sizedintervals. Since the size of views is an integer value,the logarithm of the estimation for a peer at a giventime can be off by 1. For instance, when the networkreaches 65k peers, the average partial view size juststarts to go above 11, meaning that some peers have

12 neighbors in their partial views. Since the local-knowledge estimation of a peer x is V t

x = exp(|P tx|),

increasing peer x’s view size by 1 results in a muchlarger estimate. Hence the sudden increase in stan-dard deviation at such network sizes. The standarddeviation decreases as the actual network size ap-proaches the overestimated size.

This effect can be mitigated by averaging the localvalue with the neighbors’ partial view sizes: the com-puted value is no longer an integer but an aggregationof representative values chosen at random.

5.6 Robustness

Objective: To show Spray’s robustness to fail-ures.

Description: To evaluate robustness, we countthe number of weakly and strongly connected com-ponents of the network. Counting strong compo-nents allows us to estimate the effectiveness of infor-mation dissemination protocols built on top of ran-dom peer-sampling protocols. For example, with twostrong components, bidirectional communication be-tween the two components cannot be achieved. Thefirst strong component may be able to reach the sec-ond, but the second may not reach the first. However,the network remains in a repairable state, for at leastone link units these components. After some shuf-fling, the two strong components add links to eachother. Thus, the two strong components merge intoone. Therefore, counting the weak components al-lows us to estimate to what extent the network canbe repaired, i.e., two weak components exist whenthere is no link to unit them.

In our experiment, we configure Cyclon’s view sizeto 9 which is approximately equivalent to Spray’sview size when the network reaches 8k peers. Thenetwork contains 10k members. We perform the re-movals after the approaches have converged to a sta-ble overlay network. We remove a random set of peersat once following a uniform distribution. We removefrom 25 to 95 percent of peers every 5 percents, i.e.,16 runs for each approach. We perform a last mea-surement at 99 percent. We measure the number ofcomponents immediately after each removal.

17

Page 19: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 99

nu

mb

er

of

co

mp

on

en

ts /

ne

two

rk s

ize

% of peers removed

Spray Cyclon

weakstrong

weak strong

Figure 10: Robustness to massive failures.

Results: Figure 10 shows the robustness ofSpray and Cyclon to massive random failures. Thex-axis denotes the percentage of peers removed atonce. The y-axis denotes the ratio of strong/weakcomponents over the network size after removals.First, Figure 10 shows that both the random peer-sampling protocols Spray and Cyclon suffer fromdeteriorated behavior at high removal percentages,Cyclon being slightly better in this term. Figure 10shows that the number of strong components startsto increase at 45 percents and to quickly increase at70 percents. Hence, the information disseminationstarts to degrade at 45 percents of removals: mes-sages broadcast by some peers cannot reach all peers.Fortunately, Figure 10 also shows that the approachesare able to recover from dire state until high removalratio. Indeed, the number of weak components startsto increase at 70 percents of removals, meaning thatthere is no link between some parts of the network tounit them. In other terms, some parts of the networkbecame completely disjoint. Shuffling cannot be ini-tiated between members of each other disjoint parts.Consequently, the network cannot fully repair itselfabove 70 percents of removals.These results show that both Spray and Cyclon fitcontexts such as the internet where numerous peerscan join and leave freely in short periods of time.Reasons: The random peer-sampling approaches

Cyclon and Spray yield very similar results be-

cause the number of neighbors provided by both ap-proaches is very close when the network reaches 10k.Yet Cyclon performs slightly better for two rea-sons. First the variance in the degree distributionof Spray causes Cyclon to have more arcs in thisspecific experiment. Second, Spray wastes some ofits neighbors due to the presence of duplicate entriesin its views (yet the number of duplicates remainssmall – see Figure 11). Still both protocols preservethe ability to disseminate information until very largeremoval percentages. Another way to interpret thisresult consists in observing that when all peers havesimilar degrees, (see Figure 7), removing a particularpeer does not greatly affect connectedness. As sug-gested above, the direction of arcs impacts more in-formation dissemination than the peer-sampling pro-tocol itself. If an arc constitutes the last link betweentwo clusters of the network, the messages from oneof these clusters cannot reach the other one. Yet,this arc is enough for the peer-sampling protocol tostart shuffling the views, ultimately populating themwith members from both cluster. Hence, the networkrepairs itself. When clusters become fully disjoints,neither Cyclon nor Spray is able to repair the net-work.

5.7 Duplicates

Objective: To show that a small proportion ofpeers contains duplicates in their partial view.

Description: Using Spray as random peer-sampling protocol, we measure the amount of peerswhich have a partial view containing at least one du-plicated reference. We perform the measurements onnetworks containing 0.1k, 1k, 10k, 100k, and 500kpeers. We measure the number of duplicates afterconvergence. We put this in relation with a theoret-ical approximation from the birthday paradox wherethe view size would be the number of randomly cho-sen people and the identity of peers would be thebirthdates. The probability of a peer to not haveduplicates is approximately:

− exp(− ln(|V t|) ∗ (ln(|V t|)− 1)

2 ∗ |V t|)

18

Page 20: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

90

92

94

96

98

100

0.1k 1k 10k 100k 500k

% o

f p

ee

rs w

ith

ou

t d

up

lica

tes

number of peers (log10)

Spraybirthday paradox approximation

Figure 11: Duplicates in networks of different size.

Results: Figure 11 shows the proportion of peersusing a partial view containing duplicates. We ob-serve that there always exist partial views with atleast one duplicate. The proportion is more impor-tant when the network size is small (e.g. 5 percentsfor 0.1k peers). It becomes a minor overhead whenthe network size is larger (e.g. less than 1 percentfor 10k peers). The birthday paradox approximationseems to follow very closely the experimental results.It empirically indicates that there exists a relation be-tween the duplicates and the birthday paradox. Theproportion of peers without duplicates tends to 100percents as the network size grows.Reasons: While the number of peers in the net-

work grows linearly, the neighborhood size of eachpeer only grows logarithmically. This significant dif-ference between the growths leads to the fact thatthe chances of a particular peer to have at least twicethe reference to another peer becomes smaller as thenumber of peers in the network increases.

5.8 Failures in connection establish-ment

Objective: To show that Spray does not sufferfrom failures in connection establishments, contrarilyto Scamp.Description: We measure both the arc count and

the number of weak components in the network. The

0 102030405060708090

# o

f co

nn

ectio

ns (

x1

03)

SprayCyclonScamp

00.10.20.30.40.50.60.70.80.9

1

0 10 20 30 40 50

we

ak c

om

po

ne

nts

ra

tio

cycles (x103)

Figure 12: Number of arcs and partitioning in net-works subject to failures in the connection establish-ments.

simulations involve Cyclon configured with partialviews containing 9 neighbors, Scamp2, and Spray.They run over 50k cycles. The network initially con-tains 10k members. To establish a connection, we usethe WebRTC three-way handshake, i.e., the initialpeer emits an offer ticket, the arrival peer stamps theticket, the initial peer finalizes the connection usingthe stamped ticket (see Section 2). The probabilitythat the ticket fails to traverse a hop is set to 10−3.

Results: The top part of Figure 12 shows the arccount of the random peer-sampling protocols whilethe bottom part of Figure 12 shows the weak compo-nents of the network. As expected, we observe thatthe arc counts of Cyclon and Spray stays constantover cycles: 90k and 93k arcs for Cyclon and Sprayrespectively. On the other hand, Scamp suffersfrom failures in connection establishment. This di-rectly impacts network connectedness as measured bythe weak-component ratio. The network of Scampquickly degrades.These results show that both Spray and Cyclon fitthe constraints set by the connection establishementprotocol of WebRTC.

Reasons: The arc counts of Cyclon and Spray

2A modified version of Scamp whose periodic protocolworks properly when there is no connection failures. Avail-able at https://github.com/justayak/peersim-spray

19

Page 21: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

remains constant over time but for different reasons.In Cyclon, the shuffling protocol makes sure thatthe partial view is filled to its maximum. Whena peer removes a broken connection, it replaces itwith a fresh one in the following round. In Spray,when the protocol tries to use a broken connection,it replaces it with another known one. Thus, thearc count stays constant and the shuffling protocolmakes sure that duplicates disappear over time (thearc moves to another peer where it is not a duplicate).Scamp, however, does not establish new connectionswith the neighbors of its neighbors. Each hop of con-nection establishment process is an opportunity forfailure. Let Pf be the probability that an element ofthe dissemination path (either a peer or a connection)crashes or leaves during a hop of the three-way hand-shake, without any possible recovery. Let PE be theprobability that a connection establishment cannotbe completed. Without three-way handshake, PE isstraightforward:

PScampE, 1way = 1− (1− Pf )k+1 (1)

This corresponds to the probability that each element(arc and peer) in the path of size k + 1 stays aliveduring the random walk. In the context of WebRTC,the offer ticket must travel back to its emitter. As aconsequence, the elements of the random walk cannotfail until the stamped ticket travels back. We obtain:

PScampE, 3way = 1− ((1− Pf )2(k+1)(1− Pf )2k . . . (1− Pf )2)

= 1− (1− Pf )k2+3k+2 (2)

In other terms, the first chosen arc and peer in thepath must stay alive 2k + 2 hops, the second chosenarc and peer must stay alive 2k hops and so on. Thislong duration leads to a quicker degeneration of theconnection count.

6 Use case: broadcasting mes-sages

In this section, we highlight how protocols built ontop of Spray can benefit from its adaptiveness. Wefocus on broadcasting messages. Firstly, we consider

the case of video streaming using a three-phase gossipprotocol [28, 15, 29]. Using Spray, we expect a stablemessage-delivery rate even when the size of the net-work fluctuates. Using Cyclon, we expect degradedperformance when the size of the network is largerthan expected at configuration time. Secondly, weuse the case of gossiping in decentralized real-timecollaborative editing. Using Spray, we expect thegenerated traffic to scale logarithmically with respectto the size of the network.

The first experiment runs on the PeerSim sim-ulator [30] and the code is available on the Githubplatform3. The second experiment is a simulation in-volving up to 600 interconnected Web browsers onthe Grid’5000 testbed. The code is available on theGithub platform4.

6.1 Streaming

Live video streaming has grown in importance overrecent years and companies offering peer-to-peerstreaming gather large numbers of viewers. For ex-ample, HiveStreaming5 declared having a customernetwork of 400k peers [35]. The uploading bitrate re-quired to serve such a large number of users wouldlead to huge operational costs, which can instead besaved through direct peer-to-peer exchanges.

6.1.1 Operation

A number of existing systems [16, 29, 45] use somevariant of gossip to achieve effective stream dissem-ination in the presence of churn. In the follow-ing, we consider the three-phase variant adoptedby [15, 28, 29, 45]. Nodes gossip advertisement mes-sages for available packets to fanout other nodes,which then pull the packets they need. Algorithm 4details the operation of this protocol. First, with atime interval of δ (200ms in [15, 16]), each peer adver-tises the packets it can serve to f (fanout) neighbors.Second, a peer receiving such an advertisement re-quests the packets it needs. Third, the advertisingpeer sends the requested packets.

3https://github.com/justayak/peersim-spray4https://github.com/Chat-Wane/CRATE5https://www.hivestreaming.com/

20

Page 22: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

In their analysis of this three-phase model, the au-thors of [15] observed that, in bandwidth-constrainedscenarios, peers should vary their communicationpartners as often as possible in order to equalize ev-eryone’s contribution. There are two ways to achievethis in a large scale setting: (i) refreshing the view ofthe peer-sampling protocol (e.g. Cyclon, or Spray)before sending each advertisement packet, or (ii) us-ing a view that is much larger than the gossip fanoutand refreshing it less often. Due to the relativelysmall size of view-exchange messages, solution (ii)achieves the best trade-off, while solution (i) is ex-tremely impractical and waste a significant amountof bandwidth just for packet headers—[15, 16] haveeach node send 5 advertisement packets per second.

For this reason, [17] follows solution (ii) and setsthe fanout to ln(N) + c (according to theory [25]), Nbeing the number of participating peers, and c beinga positive constant.

6.1.2 Experimentation

Objective: To show how broadcast protocols canbenefit from an adaptive random peer sampling pro-tocol.Description: In this experiment, we reproduce

the first gossip phase of the protocol in [15, 17, 29]and model an application that streams live contentto 100 peers. We configure Cyclon’s partial view to6 times the threshold fanout required for 100 nodes(6 ln(100) ≈ 6 · 5 = 30, analogously to [17]), andconsider two configurations for the broadcast fanoutln(100) + 1 ≈ 6 and ln(100) + 3 ≈ 8, which pro-vide high reliability with 100 nodes. Similarly, weconfigure Spray to have partial views size equal to6 · ln(|V t|)—instead of adding only 1 arc targetingits contact (see Section 4.1), the newcomer adds 6arcs—and sets the fanout to one sixth of the view sizeplus 1 or 3. This gives both protocols the same con-figuration for 100 peers. We then consider networksizes growing from 100 to 2000 peers, and measurethe ratio of fully delivered messages to 1k sent mes-sages, i.e., we evaluate the fraction of all messagesthat reach all the members of the network.Results: Figure 13 shows the results of this ex-

periment. First, we observe that the ratio of fully de-

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000

full

me

ssa

ge

de

live

ry r

atio

number of peers

Spray ln(|Vt|)+3

Spray ln(|Vt|)+1

Cyclon ln(100)+3Cyclon ln(100)+1

Figure 13: Ratio of messages fully delivered.

livered messages using Cyclon suffers from a steadydecrease. The highest fanout leads to better results.In contrast, Spray remains stable despite the pres-ence of small jumps. A fanout set to ln(|V t|) + 1gives a full delivery ratio above 90%. A fanout setto ln(|V t|) + 3 is very close to 100%. Overall, thebroadcast mechanism benefits from the adaptive na-ture of Spray’s partial view, which allows it to scaleaccording to the size of the network.

Reasons: To ensure a fair use of the bandwidthcontributed by peers, we configure the view size tobe larger than the fanout. This allows stream pack-ets in three-phase gossip to follow “almost” randomsubtrees of the complete overlay. But it also leadsthe first gossip phase to require a fanout of at leastln(N) + c [25], as the random sub-graphs of the over-lay grafted in the first phase may fail to reach allnodes if the fanout is too low. In our case, the broad-cast protocol built on top of Cyclon has a constantfanout set for a specific network size. As long as thenetwork is smaller than this value, the full deliveryratio stays high. However, it quickly decreases withlarger network sizes. Spray instead allows the ap-plication fanout to follow the growth of the partialview size, which scales logarithmically compared tothe network size. This allows the streaming applica-tion to adapt to changes in the size of the network.As a result, the full delivery ratio provided by thebroadcast mechanism on top of Spray remains sta-

21

Page 23: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

Algorithm 4 Three-phase gossip.

1: INITIALLY:2: B B buffer of packets3: b B number of advertised packets4: f B fanout f ≤ |P |5:

6: ACTIVE THREAD:7: function advertisementLoop(( )) B every δ time8: let advertiseTo← getPeers(P, f); B f distinct random peers from P9: let advertisement← getIdentifiers(B, b); B b distinct random packet Ids

10: for each q ∈ advertiseTo do sendTo(q, ′advertisement′, advertisement);

11:

12: PASSIVE THREAD:13: function onAdvertisement(o, ads) B o : advertiser14: for each (id ∈ ads) do15: if id 6∈ B then sendTo(o, ′request′, id);

16:

17: function onRequest(o, id) B o : requester18: let packet← getMessage(B, id); B get the requested packet from the buffer19: sendTo(o, packet); B finally send the packet

ble. Since the values manipulated by peers—partialview size and fanout—are integers, the measurementsmake small jumps when a rounded value increasesenough to be incremented.

Objective: To show how the full delivery ratio be-haves during a buzz, i.e., a sudden massive increasingof network size shortly followed by departures.

Description: We configure Spray and Cyclonlike in the previous experiment. For Cyclon, we usea partial view size of 30 and two broadcast fanout val-ues: ln(100) + 1 and ln(100) + 3. For Spray, we usea dynamic view of 6 · ln(|V t|) and broadcast fanoutvalues of ln(|V t|) + 1 and ln(|V t|) + 3. The simu-lation starts with 100 peers. The network quicklyreaches 10k peers during the popularity burst. Thenthe network size decreases to 3k members. In thisexperiment, we measure the full delivery ratio overeach group of 100 consecutive messages.

Results: Figure 14 shows the results of this ex-periment. The top part of the figure shows the evo-lution of the network size over time. The bottomfigure shows the full delivery ratio over time. Like

0 1 2 3 4 5 6 7 8 9

10

nu

mb

er

of

pe

ers

(x1

03)

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140

full

me

ssa

ge

de

live

ry r

atio

time (in cycles)

Spray ln(|Vt|)+3

Spray ln(|Vt|)+1

Cyclon ln(100)+3Cyclon ln(100)+1

Figure 14: Full message delivery ratio during a buzz.

22

Page 24: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

in the previous experiment, Figure 14 shows that us-ing Cyclon and a predefined fanout works well untilthe network size reaches an unexpected size. Duringthe peak, the delivery ratio falls quickly. Conversely,the broadcast mechanism built on top of Spray auto-matically adapts the fanout to the size of the network.Thus, it does not experience performance losses whilea large number of nodes suddenly join, even thoughwe observe a decrease in delivery ratio during thedepartures of peers, in particular with a fanout ofln(|V |) + 1. This demonstrates that, in the contextof three-phase gossip, Spray makes broadcast moreresilient to quick membership changes.Reasons: The fanout of configurations involving

Cyclon is constant. When the number of peers inthe network exceeds the expectations, the deliveryratio quickly degenerates. Concerning Spray, thefanout follows the evolution of the network thanks toadaptive partial views. Both Spray and Cyclon de-tect departing nodes during the shuffling phase. Theassociated delay leads to the use of some stale arcsfrom partial views, which explains the temporary de-crease in delivery ratio during the shrinking phase.Since both Cyclon and Spray clean their partialviews over time, the delivery ratio recovers its ex-pected value.

6.2 Real-time editing

Crate is a real-time editor that allows authors towrite anytime and anywhere, whatever the number ofparticipants, without any third party [32]. Comparedto trending Cloud-based editors such as Google Docs,it alleviates privacy, scalability, and single-point-of-failure issues while remaining easy to use. It can beused for small groups but also during events suchas massive online lectures, TV shows, or conferencesthat gather larger groups. For instance, online lec-ture sessions reach thousands of students. Transi-tions between small groups and large groups is sup-ported transparently thanks to Spray. Distributedreal-time editing is a pertinent context for Spray forgroup sizes differ depending on the document, andchange quickly over time.Crate builds a network of browsers where each

browser is able to communicate with a logarithmi-

cally scaling number of browsers compared to theglobal network size. Each change performed on doc-uments transits through neighbors and reaches allmembers in a scalable way [7]. Unlike the state-of-the-art [38, 39], Spray allows the diffusion cost toadapt to the network size without any probing mech-anism or any central authority. Without Spray, de-velopers would have to overestimate the parametersof peer-sampling protocols to get a system that be-haves well for large as well as for small networks.This would increase the cost of maintaining connec-tions alive (e.g. in a WebRTC context) and moreimportantly the cost of application protocols (e.g.broadcasting protocols). Spray, on the other hand,ensures that small networks do not pay the price oflarge networks without requiring the intervention ofdevelopers or end users.

6.2.1 Operation

To provide availability and responsiveness in doc-uments, collaborative editors consider multiple au-thors, each hosting a replica of their shared docu-ment [36]. On updates, the local replica is directlymodified. Then, each update is propagated to all theediting session where it is integrated. Strong eventualconsistency [4] states that a system is correct if andonly if the replicas converge to an equivalent statewhen they integrated the same updates. In otherterms, the users read identical documents.

Crate uses a distributed sequence data type [37]which provides two commutative operations: the in-sertion and the removal of an element. Commuta-tivity allow users to avoid the difficult, time con-suming, and error-prone task of solving conflicts dueto concurrent edits. However, commutativity comesat the price of metadata attached to each charac-ter. These metadata (referred to as identifiers) areunique and immutable. The structure representingan identifier is a list [`1.`2 . . . `k] the size k of whichis determined during allocation. Crate uses an allo-cator function [33] to keep these identifiers under ac-ceptable growth, i.e., a polylogarithmic upper boundon their space complexity compared to the documentsize: O((log |document|)2).

An editor sends each pair of identifier and character

23

Page 25: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

to all members of the editing session. For this pur-pose, Crate uses a very simple broadcast protocolbuilt on top of Spray following the principles of epi-demic dissemination (also known as gossiping) [12].Such mechanisms make extensive use of partial viewsto efficiently disseminate messages.

Algorithm 5 gossip.

1: function broadcast(m) B m : message2: for each 〈q, 〉 ∈ P dosendTo(q, ′broadcast′, m);

3: function receiveBroadcast(m) B m : message4: if ¬alreadyReceived(m) then5: deliver(m);6: broadcast(m); B rebroadcast

Algorithm 5 shows the instructions of this proto-col. When a peer emits a message, it sends it toits neighborhood. Each peer receiving such messagesfor the first time forwards it to its neighborhood too.Messages transitively reach all network members.

Since the gossiping algorithm depends of the neigh-borhood provided by Spray, and since the lattergrows logarithmically compared to the network size,the communication complexity of an application isupper-bounded by O(m ln |V t|), where m is the spacecomplexity of a message. Since Crate’s identifiersare sublinearly upper-bounded compared to the doc-ument size, Crate scales well.

6.2.2 Experimentation

We highlight the improvements brought to protocolsbuilt on top of Spray on a real-life use case aboutdecentralized collaborative editing in Web browsers.Objective: To show that the traffic generated by

Spray stays low compared to the traffic generatedby broadcasting. To show the influence of Spray’sadaptiveness over the traffic generated by decentral-ized editors running on a network of browsers.Description: Experiments run on Grid’5000

where machines host 5 browsers each. Browsers openCrate and connect to an editing session through asignaling server. Runs comprise from 101 browsersto 601 browsers with 100 browsers increments, i.e.,

6 different runs. The first editor creates the edit-ing session which is progressively joined by the otherwriters (1 joiner per 5 seconds). Each member startssharing the access to the editing session as soon as itjoins it. Outsiders join the network through one ofthem chosen at random. Once all peers have joinedthe editing session, the document editing starts. Theinsertion rate is 100 insertions per second uniformlydistributed among peers. Each experiment runs dur-ing 8 hours of which 7 hours are dedicated to editing.The document size reaches millions of characters.

Results: Figure 15a and Figure 15b show the av-erage traffic per second generated by members in-volved in collaborative editing. The x-axis denote thetime progression of the warmup and the experimentin percentage over 45 minutes and 7 hours respec-tively. The y-axis denote the average traffic gener-ated by Web browsers. The y-axis scale of Figure 15ais in KByte while the y-axis scale of Figure 15b is inMByte. The legend shows the average and variance ofpartial view size associated with each run. Figure 15ashows that the traffic generated by the random peersampling protocol Spray increases as peers join theediting session over time. As expected, the morepeers in the network the more traffic is generated.Nevertheless, it stays an order of magnitude belowthe traffic generated by broadcasting. In Figure 15b,the height of plots corresponds to the multiplicativefactor coming from the messages dissemination. Asexpected, this factor grows logarithmically regardingthe network size. Thus, 101 browsers have an av-erage traffic lower than 601 browsers because theirpartial views are smaller in average. On the oppo-site, using Cyclon, the traffic would have been thesame for all runs. Since it commonly overestimatespartial views to accommodate with any network size,the traffic would have been higher. It is importantsince even small partial view size differences signifi-cantly impact traffic. Figure 15b also shows that theaverage partial view size follows the natural logarith-mic expectation. Yet, the run involving 501 browsershas a slightly higher average partial view size thanthe run involving 601 browsers. Because the join-ing part of Spray establishes a number of WebRTCconnections depending on the first contact member,there are variations between independent runs. Still,

24

Page 26: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

0

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50 60 70 80 90

ave

rag

e t

raff

ic p

er

bro

wse

r (K

Byte

s/s

)

warmup progression (%)

601 browsers501 browsers401 browsers301 browsers201 browsers101 browsers

(a) Warmup time: peers join the session. Only Spray gener-ates the outgoing traffic.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 10 20 30 40 50 60 70 80 90

ave

rag

e t

raff

ic p

er

bro

wse

r (M

byte

/s)

experiment progression (%)

601 browsers; |P| 6.3; σ2 0.31

501 browsers; |P| 6.7; σ2 0.62

401 browsers; |P| 5.8; σ2 0.29

301 browsers; |P| 5.3; σ2 0.34

201 browsers; |P| 5.4; σ2 0.29

101 browsers; |P| 4.2; σ2 0.39

(b) Editing time: peers write a document. Both Spray andCrate generate the outgoing traffic.

Figure 15: Average traffic per second generated by a real application.

25

Page 27: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

Spray scales logarithmically overall. Figure 15b fi-nally shows that the variance of partial views σ2—displayed in the legend—stays small, which indicatesthat the network reached a state where neighborhoodsizes are balanced, hence, where the load is balanced.Reasons: Random peer sampling protocols gen-

erate little traffic, for they only exchange small mes-sages every period of time. The measurements of Fig-ure 15a also take into account the traffic generatedby WebRTC connections that constantly send fewbytes to check if the connection is alive, i.e., heart-beat messages. Adding peers to the editing session in-creases partial view sizes, hence, the generated traffic.Broadcast protocols use neighborhoods to dissemi-nate messages. Each member receives and forwardseach operation which transitively reaches all mem-bers. Thus, the traffic depends on messages size mul-tiplied by neighborhoods size logarithmically scalingthanks to Spray. The growth during each run cor-responds to the polylogarithmic growth of identifiersfrom the editors. Since the document size increasesover time, the LSeq’s identifiers grow accordinglywhich impacts on messages size [33].

7 Conclusion and perspectives

In this paper, we described Spray, an adaptive-by-design random peer-sampling approach designed tofit WebRTC’s constraints. Spray provides: (i) log-arithmically growing partial views reflecting theglobal network size, (ii) constant time complexity onconnection establishments using solely neighbor-to-neighbor interactions. Our experiments demonstratehow Spray’s adaptiveness improves the performanceof random peer-sampling when the size of the net-work changes. In particular, the average shortestpath length scales better and the in-degree evolveswith the network size. Adaptiveness comes at theprice of duplicates in the partial views. However, oursimulations supported by theoretical analysis showthat the number of duplicates remains very low andbecomes negligible in large networks.

To highlight the improvements brought by Spray,we demonstrated how a broadcast protocol can takeadvantage of its adaptiveness to adjust the fanout and

handle a sudden burst of popularity. We also builtCrate, a decentralized collaborative editor directlyaccessible in web browsers with a full implementa-tion of Spray on top of WebRTC. We launched ex-periments involving up to 600 browsers showing thebenefits of adaptive neighborhoods.

Spray makes building scalable decentralized ap-plications in browsers easy and accessible. Develop-ers do not require to foresee the network size targetedby their applications. Just the same as for broadcast,many topology optimization protocols (e.g. about ge-olocalization, latency, or preferences) rely on randompeer-sampling protocol such as the generic algorithmsT-Man [23] and Vicinity [41]. Spray allows such pro-tocols to benefit from its self-adjusting partial views.Protocols such as D2HT [6] require both an estima-tor of the network size and a random peer-samplingprotocol. Since Spray provides both at same cost, aSpray-based D2HT would be an interesting perspec-tive.

Peer-sampling protocols build overlay networks.These overlay networks do not necessarily reflect theunderlying topology made of routers. Using We-bRTC, a connection establishment may fail depend-ing on network configurations. Section 4.3 states thatsuch failures should be handled by replacing failedlinks by duplicates to known peers. As future work,it would be interesting to see if this strategy leads toan overlay network that self-adapts to the real under-lying topology.

Acknowledgments

This work was partially funded by the French ANRproject SocioPlug (ANR-13-INFR-0003), and by theDeSceNt project granted by the Labex CominLabsexcellence laboratory (ANR-10-LABX-07-01).

Experiments presented in this paper were carriedout using the Grid’5000 testbed, supported by a sci-entific interest group hosted by Inria and includingCNRS, RENATER and several Universities as wellas other organizations (see https://www.grid5000.

fr).

26

Page 28: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

References

[1] P. http://ozan.io/p/.

[2] Peerjs. http://peerjs.com.

[3] Webrtc. http://www.webrtc.org.

[4] P. Bailis and A. Ghodsi. Eventual consistencytoday: Limitations, extensions, and beyond.Queue, 11(3):20:20–20:32, Mar. 2013.

[5] C. Baquero, P. Almeida, R. Menezes, and P. Je-sus. Extrema propagation: Fast distributedestimation of sums and network sizes. IEEETransactions on Parallel and Distributed Sys-tems, 23(4):668–675, April 2012.

[6] M. Bertier, F. Bonnet, A. M. Kermarrec,V. Leroy, S. Peri, and M. Raynal. D2ht: Thebest of both worlds, integrating rps and dht.In Dependable Computing Conference (EDCC),2010 European, pages 135–144, April 2010.

[7] K. P. Birman, M. Hayden, O. Ozkasap, Z. Xiao,M. Budiu, and Y. Minsky. Bimodal multi-cast. ACM Transactions on Computer Systems,17(2):41–88, May 1999.

[8] F. Bonnet, F. Tronel, and S. Voulgaris. BriefAnnouncement: Performance Analysis of Cy-clon, an Inexpensive Membership Managementfor Unstructured P2P Overlays, pages 560–562.Springer Berlin Heidelberg, Berlin, Heidelberg,2006.

[9] G. Camarillo and J. Maenpaa. Self-tuning dis-tributed hash table (dht) for resource locationand discovery (reload). RFC 7363, Ericsson,September 2014.

[10] R. Carvajal-Gomez, D. Frey, M. Simonin, andA.-M. Kermarrec. Web Information SystemsEngineering – WISE 2015: 16th InternationalConference, Miami, FL, USA, November 1-3,2015, Proceedings, Part II, chapter WebGC Gos-siping on Browsers Without a Server, pages 332–336. Springer International Publishing, Cham,2015.

[11] G. DeCandia, D. Hastorun, M. Jampani,G. Kakulapati, A. Lakshman, A. Pilchin,S. Sivasubramanian, P. Vosshall, and W. Vogels.Dynamo: Amazon’s highly available key-valuestore. SIGOPS Oper. Syst. Rev., 41(6):205–220,Oct. 2007.

[12] A. Demers, D. Greene, C. Hauser, W. Irish,J. Larson, S. Shenker, H. Sturgis, D. Swinehart,and D. Terry. Epidemic algorithms for replicateddatabase maintenance. In Proceedings of thesixth annual ACM Symposium on Principles ofdistributed computing, pages 1–12. ACM, 1987.

[13] P. Erdos and A. Renyi. On random graphs i.Publ. Math. Debrecen, 6:290–297, 1959.

[14] P. T. Eugster, R. Guerraoui, S. B. Han-durukande, P. Kouznetsov, and A.-M. Kermar-rec. Lightweight probabilistic broadcast. ACMTransactions on Computer Systems (TOCS),21(4):341–374, Nov. 2003.

[15] D. Frey, R. Guerraoui, A. Kermarrec,M. Monod, and V. Quema. Stretching Gossipwith Live Streaming. In DSN, 2009.

[16] D. Frey, R. Guerraoui, A.-M. Kermarrec,B. Koldehofe, M. Mogensen, M. Monod, andV. Quema. Heterogeneous Gossip. In Middle-ware, 2009.

[17] D. Frey, R. Guerraoui, A.-M. Kermarrec, andM. Monod. Live Streaming with Gossip. Re-search report, Inria Rennes Bretagne Atlantique; RR-9039, Mar. 2017.

[18] A. Ganesh, A.-M. Kermarrec, E. Le Merrer, andL. Massoulie. Peer counting and sampling inoverlay networks based on random walks. Dis-tributed Computing, 20(4):267–278, 2007.

[19] A. Ganesh, A.-M. Kermarrec, and L. Massoulie.Scamp: Peer-to-peer lightweight membershipservice for large-scale group communication. InJ. Crowcroft and M. Hofmann, editors, Net-worked Group Communication, volume 2233 ofLecture Notes in Computer Science, pages 44–55. Springer Berlin Heidelberg, 2001.

27

Page 29: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

[20] A. Ganesh, A.-M. Kermarrec, and L. Mas-soulie. Peer-to-peer membership managementfor gossip-based protocols. IEEE Transactionson Computers, 52(2):139–149, Feb 2003.

[21] M. Jelasity, R. Guerraoui, A.-M. Kermarrec,and M. van Steen. The peer sampling ser-vice: Experimental evaluation of unstructuredgossip-based implementations. In H.-A. Jacob-sen, editor, Middleware 2004, volume 3231 ofLecture Notes in Computer Science, pages 79–98. Springer-Verlag, 2004.

[22] M. Jelasity and A. Montresor. Epidemic-styleproactive aggregation in large overlay networks.In Proceedings. 24th International Conferenceon Distributed Computing Systems., pages 102–109, 2004.

[23] M. Jelasity, A. Montresor, and O. Babaoglu. T-man: Gossip-based fast overlay topology con-struction. Computer Networks, 53(13):2321 –2339, 2009. Gossiping in Distributed Systems.

[24] M. Jelasity, S. Voulgaris, R. Guerraoui, A.-M.Kermarrec, and M. Van Steen. Gossip-basedpeer sampling. ACM Transactions on ComputerSystems (TOCS), 25(3):8, 2007.

[25] A.-M. Kermarrec, L. Massoulie, and A. Ganesh.Probabilistic Reliable Dissemination in Large-Scale Systems. TPDS, 14(3):248–258, 2003.

[26] J. Kleinberg. The small-world phenomenon: Analgorithmic perspective. In Proceedings of theThirty-second Annual ACM Symposium on The-ory of Computing, STOC ’00, pages 163–170,New York, NY, USA, 2000. ACM.

[27] J. Leitao, J. Pereira, and L. Rodrigues. Hy-parview: A membership protocol for reliablegossip-based broadcast. In Dependable Systemsand Networks, 2007. DSN ’07. 37th AnnualIEEE/IFIP Intern ational Conference on, pages419–429, June 2007.

[28] H. Li, A. Clement, M. Marchetti, M. Kapritsos,L. Robinson, L. Alvisi, and M. Dahlin. Flight-Path: Obedience vs. Choice in Cooperative Ser-vices. In OSDI, 2008.

[29] M. Monod. Live Streaming with Gossip. PhDthesis, IC, Lausanne, 2010.

[30] A. Montresor and M. Jelasity. Peersim: Ascalable P2P simulator. In Proceedings of the9th International Conference on Peer-to-Peer(P2P’09), pages 99–100, Seattle, WA, Sept.2009.

[31] A. Montresor, M. Jelasity, and O. Babaoglu. Ro-bust aggregation protocols for large-scale over-lay networks. In International Conference onDependable Systems and Networks, pages 19–28,June 2004.

[32] B. Nedelec, P. Molli, and A. Mostefaoui. Crate:Writing stories together with our browsers. InProceedings of the 25th International ConferenceCompanion on World Wide Web, WWW ’16Companion, pages 231–234, Republic and Can-ton of Geneva, Switzerland, 2016. InternationalWorld Wide Web Conferences Steering Commit-tee.

[33] B. Nedelec, P. Molli, A. Mostefaoui, andE. Desmontils. LSEQ: an Adaptive Structurefor Sequences in Distributed Collaborative Edit-ing. In ACM, editor, 13th ACM Symposium onDocument Engineering, Sept. 2013.

[34] R. Roverso and M. Hogqvist. Hive.js: Browser-based distributed caching for adaptive videostreaming. In Multimedia (ISM), 2014 IEEE In-ternational Symposium on, pages 143–146, Dec2014.

[35] R. Roverso, R. Reale, S. El-Ansary, andS. Haridi. Smoothcache 2.0: Cdn-quality adap-tive http live streaming on peer-to-peer overlays.In Proceedings of the 6th ACM Multimedia Sys-tems Conference, MMSys ’15, pages 61–72, NewYork, NY, USA, 2015. ACM.

[36] Y. Saito and M. Shapiro. Optimistic replication.ACM Comput. Surv., 37(1):42–81, Mar. 2005.

[37] M. Shapiro, N. Preguica, C. Baquero, andM. Zawirski. Conflict-free replicated data types.

28

Page 30: An Adaptive Peer-Sampling Protocol for Building …video streaming, for example, a peer-sampling pro-tocol makes it possible to distribute the streaming load over all peers without

Stabilization, Safety, and Security of DistributedSystems, pages 386–400, 2011.

[38] N. Tolgyesi and M. Jelasity. Adaptive peer sam-pling with newscast. In H. Sips, D. Epema, andH.-X. Lin, editors, Euro-Par 2009 Parallel Pro-cessing, volume 5704 of Lecture Notes in Com-puter Science, pages 523–534. Springer BerlinHeidelberg, 2009.

[39] S. Voulgaris, D. Gavidia, and M. van Steen. Cy-clon: Inexpensive membership management forunstructured p2p overlays. Journal of Networkand Systems Management, 13(2):197–217, 2005.

[40] S. Voulgaris, E. Riviere, A.-M. Kermarrec, andM. Van Steen. Sub-2-Sub: Self-OrganizingContent-Based Publish and Subscribe for Dy-namic and Large Scale Collaborative Networks.Research Report RR-5772, INRIA, 2005.

[41] S. Voulgaris and M. van Steen. Epidemic-stylemanagement of semantic overlays for content-based searching. In J. Cunha and P. Medeiros,editors, Euro-Par 2005 Parallel Processing, vol-ume 3648 of Lecture Notes in Computer Sci-ence, pages 1143–1152. Springer Berlin Heidel-berg, 2005.

[42] D. J. Watts and S. H. Strogatz. Collectivedynamics of ‘small-world’ networks. Nature,393(6684):440–442, 06 1998.

[43] F. Wuhib, M. Dam, R. Stadler, and A. Clem.Robust monitoring of network-wide aggregatesthrough gossiping. Network and Service Man-agement, IEEE Transactions on, 6(2):95–109,2009.

[44] L. Zhang, F. Zhou, A. Mislove, and R. Sun-daram. Maygh: Building a cdn from client webbrowsers. In Proceedings of the 8th ACM Euro-pean Conference on Computer Systems, EuroSys’13, pages 281–294, New York, NY, USA, 2013.ACM.

[45] M. Zhang, Q. Zhang, L. Sun, and S. Yang. Un-derstanding the Power of Pull-Based Stream-ing Protocol: Can We Do Better? JSAC,25(9):1678–1694, 2007.

29