Pisces: Anonymous Communication Using Social Networks · Pisces: Anonymous Communication Using Social Networks Prateek Mittal University of California, Berkeley [email protected]

arX

iv:1

208.

6326

v1 [

cs.C

R]

30 A

ug 2

012

Pisces: Anonymous Communication Using SocialNetworks

Prateek MittalUniversity of California, Berkeley

[email protected]

Matthew WrightUniversity of Texas at Arlington

[email protected]

Nikita BorisovUniversity of Illinois at Urbana-Champaign

[email protected]

Abstract—The architectures of deployed anonymity systemssuch as Tor suffer from two key problems that limit user’s trustin these systems. First, paths for anonymous communication arebuilt without considering trust relationships between users andrelays in the system. Second, the network architecture relies ona set of centralized servers. In this paper, we proposePisces,a decentralized protocol for anonymous communications thatleverages users’ social links to build circuits for onion routing.We argue that such an approach greatly improves the system’sresilience to attackers.

A fundamental challenge in this setting is the design of asecure process to discover peers for use in a user’s circuit. Allexisting solutions for secure peer discovery leverage structuredtopologies and cannot be applied to unstructured social networktopologies. In Pisces, we discover peers by using random walksin the social network graph with a bias away from highlyconnected nodes to prevent a few nodes from dominatingthe circuit creation process. To secure the random walks, weleverage thereciprocal neighbor policy: if malicious nodes tryto exclude honest nodes during peer discovery so as to improvethe chance of being selected, then honest nodes can use a tit-for-tat approach and reciprocally exclude the malicious nodes fromtheir routing tables. We describe a fully decentralized protocolfor enforcing this policy, and use it to build the Pisces anonymitysystem.

Using theoretical modeling and experiments on real-worldsocial network topologies, we show that (a) the reciprocalneighbor policy mitigates active attacks that an adversarycan perform, (b) our decentralized protocol to enforce thispolicy is secure and has low overhead, and (c) the overallanonymity provided by our system significantly outperformsexisting approaches.

I. I NTRODUCTION

Systems for anonymous communication on the Internet,or anonymity systems, provide a technical means to en-hance user privacy by hiding the link between the userand her remote communicating parties (such as websitesthat the user visits). Popular anonymity systems includeAnonymizer.com [8], AN.ON [3], [15], and Tor [13]. Toris used by hundreds of thousands of users [55], includingjournalists, dissidents, whistle-blowers, law enforcement, andgovernment embassies [18], [57].

Anonymity systems forward user traffic through a path(or circuit) of proxy servers. In some systems, includingTor, the proxies on the circuit are selected from among alarge number of available proxies, each of which is supposed

to be operated by a different person. An attacker, however,might run a substantial fraction of the proxies under differentidentities. He would then be able to deanonymize users whosecircuits run through his attacker-controlled proxies. Thus, thesecurity of the anonymity system hinges on at least someof the proxies in the circuit being honest [52].Having somemeans to discern which proxies are likely to be honest wouldthereby greatly enhance the security of the system.

Recently, Johnson et al. proposed a method to incorporatetrust—in which the user must judge which proxies aremore likely to be honest—intoa Tor-like system [24], [25].However, their approach relies oncentral servers, and offersonly limited scalability (see Section II). Both Nagaraja [41]and Danezis et al. [9] describe a compelling vision forleveraging social relationshipsin a decentralized (peer-to-peer) anonymity system by building circuits over edgesin a social network graph. Unfortunately, their protocolsare limited in applicability to a honest-but-curious attackermodel. Against a more powerful Byzantine adversary, bothapproaches are vulnerable toroute capture attacks, in whichthe entire circuit is comprised of malicious nodes. To ourknowledge, no proposed system securelyleverages socialrelationship information to improve the chances of attacker-free circuit constructionin a decentralized anonymity system.Our contributions: In this paper, we propose to use socialnetworks to help construct circuits that are more robust tocompromise than any prior approach among decentralizedanonymity systems.We take advantage of the fact that,when protected from manipulation, random walks on socialnetwork topologies are likely to remain among the honestusers [28], [63]. We thus propose to construct random walkson a social network topology to select circuits in such a waythat they cannot be manipulated by a Byzantine adversary.We then build circuits from these protected random walksand show that they provide a very high chance for users tohave an honest circuit, even for users who have a few sociallinks to malicious peers.

The key challenge in this setting is to prevent the adversaryfrom biasing the random walk by manipulating their routingtables. To this end, we propose thereciprocal neighborpolicy: if malicious nodes try to exclude honest nodesduring peer discovery, then honest nodes can use a tit-for-

http://arxiv.org/abs/1208.6326v1

tat approach and reciprocally exclude the malicious nodesfrom their routing tables. The policy ensures that attemptsto bias the random walk towards malicious nodes reduce theprobability of malicious nodes being selected as intermediatenodes in the random walk, nullifying the effect of the attack.Further, to prevent an attacker from benefiting by creating alarge clique of malicious peers in the social network, we biasrandom walks away from peers with many friends.

An important contribution of our work is a techniquefor enforcing the reciprocal neighbor policy in a fully de-centralized fashion. We efficiently distribute each node’scurrent list of contacts (using Whanau [28]) so that thosecontacts can verify periodically that they are in the list. Acontact that should be in the list, but is not, can remove thenode permanently from its contacts in future time periods.Further, the list is signed by the node, so any conflicting listsfor the same time period constitute proof that the node ischeating. Using this policy, we design Pisces, a decentralizedanonymity system that uses random walks on social networksto take advantage of users’ trust relationships without beingexposed to circuit manipulation.

We demonstrate through theoretical analysis, simulation,and experiments, that our application of the reciprocal neigh-bor policy provides good deterrence against active attacks.We also show that our distributed design provides robustenforcement of this policy, with manageable overhead for dis-tributing and checking contact lists. Finally, using real worldsocial network topologies, we show that Pisces providessignificantly higher anonymity than existing approaches.Compared with decentralized approaches that do not leveragesocial networks (like ShadowWalker [37]), Pisces provides upto six bits higher entropy in a single communication round.Compared with the naive strategy of using conventional ran-dom walks over social networks (as in the Drac system [9]),Pisces provides twice the entropy over 100 communicationrounds.

II. BACKGROUND AND RELATED WORK

The focus of this work is on low-latency anonymitysystems that can be used for interactive traffic such asWeb browsing and instant messaging. Low-latency anon-ymity systems aim to defend against a partial adversarywho can compromise or monitor only a fraction of linksin the system. Most of these systems rely ononion routing[53] for anonymous communication. Onion routing enablesanonymous communication by using a sequence of relaysas intermediate nodes to forward traffic. Such a sequenceof relays is referred to as acircuit. A key property ofonion routing is that each relay on the circuit only sees theidentity of the previous hop and the next hop, but no singlerelay can link both the initiator and the destination of thecommunication.

A. Centralized/semi-centralized approaches

Most deployed systems for anonymous communicationhave a centralized or semi-centralized architecture, includ-ing Anonymizer [8], AN.ON [15], Tor [13], Freedom [7],Onion Routing [53], and I2P [22]. Anonymizer.com [8] iseffectively a centralized proxy server with a single pointof control. If the proxy server becomes compromised oris subject to subpoena, the privacy provided by the systemwould be lost. AN.ON [15] distributes the trust among threeindependently-operated servers; again, the compromise ofjust a few nodes suffices to undermine the entire system. BothAnonymizer.com and AN.ON are prone to flooding-baseddenial of service attacks. Furthermore, with both systems, itmay be possible to eavesdrop on the server(s) and use end-to-end timing attacks (such as [21], [61], [65]) to substantiallyundermine the privacy of all users.

Tor is a widely used anonymous communication system,serving roughly 500,000 users [56] and carrying terabytes oftraffic each day [55]. Tor is substantially more distributedthan either Anonymizer.com or AN.ON, with users buildingcircuits from among about 3,000 proxy nodes (onion routers)as of May 2012 [27]. This helps to protect against directattacks and eavesdropping on the entire system. Tor relies ontrusted entities calleddirectory authorities to maintain up-to-date information about all relays that are online in the formof a network consensus database. Freedom [7] and I2P [22]also use such centralized directory servers. Users downloadthe full database, and then locally select random relays foranonymous communication. Clients download this databaseevery three hours to handle relay churn.

Although Tor has a more distributed approach than anyother deployed system, there are several shortcomings withits architecture. First, Tor does not leverage a user’s trustrelationships for building circuits. In Tor, an attacker couldvolunteer a set of proxy nodes under different identitiesand then use these nodes to compromise the anonymity ofcircuits going through them. Leveraging trust relationshipshas been shown to be useful for improving anonymity againstsuch an attacker in Tor [24], [25]. Second, the trusteddirectory authorities are attractive targets for attack; in fact,some directory authorities were recently found to have beencompromised [1]. Finally, the requirement for all users tomaintain global information about all online relays becomesa scalability bottleneck. McLachlan et al. [32] showed thatunder reasonable growth projections, the Tor network couldbe spending more bandwidth to maintain this global systemview than for the core task of relaying anonymous communi-cations. The recent proposal for PIR-Tor [40] might addressthe networking scalability issues, but it does not mitigatethe basic trust and denial of service issues in a centralizedapproach.

B. Incorporating social trust

The importance of leveraging social network trust rela-tionships to improve the security and privacy properties ofsystems has been recognized by a large body of previouswork [9]–[11], [24], [25], [28], [29], [39], [41], [58], [63],[64]. Recently, Johnson et al. proposed a method to incor-porate trust into a Tor-like system [24], [25]. However, theirapproach relies oncentral servers, and offers only limitedscalability. Nagaraja [41] and Danezis et al. [9] have bothproposed anonymity systems over social networks. However,both approaches assume an honest-but-curious attack modeland are vulnerable to route capture attacks. In particular,without the security of the reciprocal neighbor policy usedin Pisces, a random walk on the social graph that goes to anattacker-controlled peer at any step can be controlled by theattacker for the remainder of the walk.

Designing anonymity systems that are aware of users’ trustrelationships is an important step towards defending againstthe Sybil attack [14], in which a single entity in the network(the attacker) can emulate the behavior of multiple identitiesand violate security properties of the system. Mechanismslike SybilGuard [64], SybilLimit [63], and SybilInfer [11]aim to leverage social network trust relationships to bound thenumber of Sybil identities any malicious entity can emulate.These mechanisms are based on the observation that it iscostly for an adversary to form trust relationships (also knownas attack edges) with honest nodes. When the adversaryperforms a Sybil attack, he can create an arbitrary numberof edges between Sybil identities and malicious entities, butcannot create trust relationships between Sybil identities andhonest users. Thus, a Sybil attack in social networks createstwo regions in the social network graph, the honest regionand the Sybil region, with relatively few edges between them;i.e., the graph features asmall cut. This cut can be used todetect and mitigate the Sybil attack.

Recent work has challenged the assumption that it is costlyfor an attacker to create attack edges with honest nodes infriendship graphs [4], [6], [23], [62], and proposed the useof interaction graphs as a more secure realization of realworld social trust. In this work, we will evaluate Pisceswith both friendship graphs as well as topologies based oninteraction graphs. Other mechanisms to infer the strengthof ties between users [17] may also be helpful in creatingresilient social graphs, but these are not the focus of thispaper.

C. Decentralized and peer-to-peer approaches

A number of distributed directory services for anonymouscommunication have been designed using a P2P approach;most have serious problems that prevent them from beingdeployed. We point out these issues briefly here. First we notethat the well-known Crowds system, which was the first P2Panonymity system, uses a centralized directory service [45]and thus is not fully P2P. The Tarzan system proposes a

gossip-based distributed directory service that does not scalewell beyond 10,000 nodes [16].

MorphMix was the first scalable P2P anonymity sys-tem [46]. It uses random walks on unstructured topologiesfor circuit construction and employs awitness scheme thataims to detect routing table manipulation. The detectionmechanism can be bypassed by an attacker who is careful inhis choices of fake routing tables [54]. With or without theevasion technique, the attacker can manipulate routing tablesto capture a substantial fraction of circuits. Despite overa decade of research, decentralized mechanisms to securerandom walks in unstructured topologies have been an openproblem. Pisces overcomes this problem by having peerssign their routing tables for a given time slot and ensuringthat nodes observe enough copies to detect cheaters quickly,before many route captures can occur, and with certainty.

Recently, several protocols have been proposed using P2Psystems built on distributed hash tables (DHTs), includ-ing AP3 [35], Salsa [42], NISAN [43], and Torsk [32].These four protocols are vulnerable to information leakattacks [36], [60], as the lookup process and circuit con-struction techniques expose information about the requestingpeer’s circuits. These attacks can lead to users being partiallyor completely deanonymized. ShadowWalker, which is alsobased on a structured topology, employs random walks onthe DHT topology for circuit construction [37]. The routingtables are checked and signed byshadow nodes such thatboth a node in the random walk and all of its shadowswould need to be attackers for a route capture attack tosucceed. The protocol was found to be partially broken, butthen also fixed, by Schuchard et al. [48]. Despite it notbeing seriously vulnerable to attacks as found against otherprotocols, ShadowWalker remains vulnerable to a small butnon-trivial fraction of route captures (roughly the same asTor for reasonable parameters). As we show in Section IV,Pisces’s use of social trust means that it can outperformShadowWalker (and Tor) for route captures when the attackerhas a bounded number of attack edges in the social network.

Other than these approaches, Mittal et al. [38] brieflyconsidered the use of the reciprocal neighbor policy foranonymous communication. However, their protocol is onlyapplicable to constant degree topologies, and utilizes centralpoints of trust. Moreover, their evaluation was very pre-liminary. In this paper, we present a complete design fora decentralized anonymity system based on the reciprocalneighbor policy. Since our design is not limited to constantdegree topologies, we explore the advantages that come fromapplying the technique to unstructured social network graphs.We also present the first fully decentralized protocols forachieving these policies and present analysis, simulation,and experimentation results demonstrating the security andperformance properties of the Pisces approach.

III. P ISCESPROTOCOL

In this section, we first describe our design goals, threatmodel, and system model. We then outline the core problemof securing random walks and describe the role of thereciprocal neighbor policy in solving the problem in thecontext of social networks. Finally, we explain how Piscessecurely implements this policy.

A. Design goals

We now present our key design goals for our system.1. Trustworthy anonymity: we target an architecture that isable to leverage a user’s social trust relationships to improvethe security of anonymous communication. Current mecha-nisms for incorporating social trust are either centralized orare limited in applicability to an honest-but-curious attackermodel.2. Decentralized design: the design should not have anycentral entities. Central entities are attractive targets forattackers, in addition to being a single point of failure for theentire system.The design should also mitigate route captureand information leak attacks.3. Scalable anonymity: the design should be able to scaleto millions of users and relays with low communicationoverhead. Since anonymity is defined as the state of beingunidentifiable in a group [44], architectures that can supportmillions of users provide the additional benefit of increasingthe overall anonymity of users.

B. Threat model

In this work, we consider a colluding adversary whocan launch Byzantine attacks against the anonymity system.The adversary can perform passive attacks such as logginginformation for end-to-end timing analysis [30], as wellas active attacks such as deviating from the protocol andselectively denying service to some circuits [5]. We assumethe existence of effective mechanisms to defend against theSybil attack, such as those based on social networks [11],[63]. Existing Sybil defense mechanisms are not perfect andallow the insertion of a bounded number of Sybil identitiesin the system. They also require the number of attack edgesto be bounded byg = O( h

log h), whereh is the number of

honest nodes in the system. We use this as our primary threatmodel. For comparative analysis, we also evaluate our systemunder an ideal Sybil defense that does not allow the insertionof any Sybil identities.

C. System Model and Assumptions

Pisces is a fully decentralized protocol and does notassume any PKI infrastructure. Each node generates a localpublic-private key pair. An identity in the system equatesto its public key. Existing Sybil defense mechanisms canbe used to validate node identities. We assume that theidentities in the system can be blacklisted; i.e., an adversarynode cannot whitewash its identity by rejoining the system

with a different public key. This is a reasonable assumption,since (a) mechanisms such as SybilInfer/SybilLimit onlyallow the insertion of a bounded number of Sybil identities,and (b) replacing deleted attack edges is expensive for theattacker, particularly in a social network graph based oninteractions. We assume loose time synchronization amongstnodes. Existing services such as NTP [34] can provide timesynchronization on the order of hundreds of milliseconds inwide area networks [20].

Finally, in this work, we will leverage mechanisms forbuilding efficient communication structures over unstructuredsocial networks, such as Whanau [28] and X-Vine [39].These mechanisms embed a structure into social networktopologies to provide a distributed hash table for efficientcommunication [47], [51]. In particular, we use Whanau,since it provides the best security guarantees amongst thecurrent state of art. Whanau guarantees that, with high prob-ability, it is possible to securely look up any object presentin the DHT. It is important to point out that Whanau onlyprovidesavailability, but notintegrity [28]. This means that ifa user performs redundant lookups for a single key, multipleconflicting results may be returned; Whanau guarantees thatthe correct result will be included in the returned set, butleaves the problem of identifying which result is correct tothe application layer. Therefore, Whanau cannot be used inconjunction with current protocols that provide anonymouscommunication using structured topologies [37], since theseprotocols require integrity guarantees from the DHT layeritself. We emphasize that the only property we assumefrom Whanau is secure routing; in particular, we do notassume any privacy or anonymity properties in its lookupmechanisms [36], [60].

D. Problem Overview

Random walks are an integral part of many distributedanonymity systems, from Tor [13] to ShadowWalker [37]. Ina random walk based circuit construction, an initiatorI of therandom walk first selects a random nodeA from its neighborsin some topology (in our case, the social network graph). Theinitiator sets up a single-hop circuit with nodeA and uses thecircuit to download a list of nodeA’s neighbors (containingthe IP addresses and public keys of neighbors). NodeI canthen select a random nodeB from the downloaded list ofnodeA’s neighbors and extend the circuit throughA ontonodeB. This process can be repeated to set up a circuit oflength l.

Random walks are vulnerable to active route captureattacks in which an adversary biases the peer discoveryprocess towards colluding malicious nodes. First, maliciousnodes can exclude honest nodes from their neighbor list tobias the peer discovery process. Second, malicious nodes canmodify the public keys of honest nodes in their neighbor list.When a initiator of the random walk extends a circuit from amalicious node to a neighboring honest node, the malicious

node can simply emulate the honest neighbor. The maliciousnode can repeat this process for further circuit extensionsas well. Finally, the malicious nodes can add more edgesbetween each other in the social network topology to increasethe percentage of malicious nodes in their neighbor lists. Tosecure the random walk process, we use a reciprocal neighborpolicy that limits the benefit to the attacker of attempting tobias the random walks. We propose a protocol that securelyrealizes this policy through detection of violations.

E. Reciprocal Neighbor Policy

We now discuss the key primitive we leverage for securingrandom walks, thereciprocal neighbor policy. The mainidea of this policy is to consider undirected versions ofstructured or unstructured topologies and then entangle therouting tables of neighboring nodes with each other. In otherwords, if a malicious nodeX does not correctly advertise anhonest nodeY in its neighbor list, thenY also excludesXfrom its neighbor list in a tit-for-tat manner. The reciprocalneighbor policy ensures that route capture attacks based onincorrect advertisement of honest nodes during random walksserves to partially isolate malicious nodes behind a small cutin the topology, reducing the probability that they will beselected in a random walk. In particular, this policy mitigatesthe first two types of route capture attacks described above,namely the exclusion of honest nodes and the modificationof honest nodes’ public keys. However, the adversary canstill bias the peer discovery process by simply insertinga large number of malicious nodes to its routing tables.Thus, the reciprocal neighbor policy described so far wouldonly be effective for topologies in which node degrees arebounded and homogeneous, such as structured peer-to-peertopologies like Chord [51] and Pastry [47]. However, nodedegrees in unstructured social network topologies are highlyheterogeneous, presenting an avenue for attack.

Handling the node degree attack: Addition of edgesamongst colluding malicious nodes in a topology increasesthe probability that a malicious node is selected in a randomwalk. To prevent this node degree attack, we propose toperform random walks using the Metropolis-Hastings modifi-cation [19], [33] — the transition matrix used for our randomwalks is as follows:

Pij =

min( 1di, 1dj) if i → j is an edge in G

1−∑

k 6=i Pik if j = i

0 otherwise

(1)

where di denotes the degree of vertexi in G. Since thetransition probabilities to neighbors may not always sum toone, nodes add a self loop to the transition probabilities toaddress this. The Metropolis-Hastings modification ensuresthat attempts to add malicious nodes in the neighbor tabledecreases the probability of malicious nodes being selectedin a random walk. We will show that the Metropolis-Hastings

modification along with reciprocal neighbor policy is surpris-ingly effective at mitigating active attacks on random walks.A malicious node’s attempts to bias the random walk processby launching route capture attacks reduce its own probabilityof getting selected as an intermediate node in future randomwalks, nullifying the effect of the attack.

F. Securing Reciprocal Neighbor Policy

We now present our protocol for securely implementingthe reciprocal neighbor policy.

Intuition: Our key idea is to have neighbors periodicallycheck each other’s neighbor lists. Suppose that nodeX andnode Y are neighbors. If nodeX ’s neighbor list doesn’tinclude nodeY , then the periodic check will reveal this andenable nodeY to implement the tit-for-tat removal of nodeXfrom its routing table. Additionally, the neighbor lists can besigned by each node so that a dishonest node can be caughtwith two different, signed lists and blacklisted. To handlechurn, we propose that all nodes keep their neighbor listsstatic for the duration of a regular interval (t) and update thelist with joins and leaves only between intervals. Here we relyon our assumption of loose time synchronization. The use ofstatic neighbor lists ensures that we can identify conflictingneighbor lists from a node for the same time interval, whichwould be a clear indication of malicious behavior. Theduration of the time interval for which the lists remainstatic determines the trade-off between the communicationoverhead for securing the reciprocal neighborhood policy andthe unreliability of circuit construction due to churn.

Setting up static neighbor list certificates:A short timeprior to the beginning of a new time interval, each node setsup a new neighbor list that it will use in the next time interval:

1) Liveness check: In the first round, nodes exchange mes-sages with their trusted neighbors to check for livenessand reciprocal trust. A reciprocal exchange of messagesensures that both neighbors are interested in advertisingeach other in the next time interval (and are not ineach other’s local blacklists). Nodes wait for a timeduration to receive these messages from all neighbors,and after the timeout, construct a preliminary versionof their next neighbor list, comprising node identitiesof all nodes that responded in the first communicationround.

2) Degree exchange: Next, the nodes broadcast the lengthof their preliminary neighbor list to all the neighbors.This step is important since Metropolis-Hastings ran-dom walks require node degrees of neighboring nodesto determine their transition probabilities.

3) Final list: After receiving these broadcasts from allthe neighbors, a node creates a final neighbor list anddigitally signs it with its private key. The final listincludes the IP address, public key, and node degreeof each neighbor, as well as the time interval for thevalidity of the list. Note that a neighbor may go offline

between the first and second step, before the node hasa chance to learn its node degree, in which case it cansimply be omitted from the final list.

4) Local integrity checks: At the beginning of every newtime interval, each node queries all its neighbors anddownloads their signed neighbor lists. When a nodeA receives a neighbor list fromB, it performs localintegrity checks, verifying thatB’s neighbor entry forA contains the correct IP address, public key, and nodedegree. Additionally, it verifies that the length of theneighbor list is at most as long as was broadcast inPhase 2. (Note that intentionally broadcasting a highernode degree is disadvantageous to aB, as it willreduce the transition probability of it being chosen bya random walk). If any local integrity checks fails,AplacesB in its permanent local blacklist, severing itssocial trust relationship withB and refusing all furthercommunication. If all the checks succeed, then theseneighbor lists serve as a cryptographic commitmentfrom these nodes–the presence of any conflicting neigh-bor lists for the same time interval issued by the samenode is clear evidence of misbehavior.If B’s neighbor list omitsA entirely, or if B simplyrefuses to send its neighbor list toA, B is placedon a temporary blacklist, andA will refuse furthercommunication withB for the duration of the cur-rent time period, preventing any circuits from beingextended fromA to B. (Effectively, A performs aselective denial-of-service againstB; see Section IV-Cfor more discussion of this.) The blacklist only lasts forthe duration of the current round, since the omissioncould have resulted from a temporary communicationfailure.

Duplicate detection:Next, we need to ensure thatB usesthe same neighbor list during random walks as it presentedto its neighbors. Our approach is to use the Whanau DHTto check for the presence of several conflicting neighborlists signed by the same node for the same time period.After performing the local checks,A will store a copy ofB’s signed neighbor list in the Whanau, usingB’s identity(namely, its public key) as the DHT key. Then, when anothernodeC performs a random walk that passes throughB, it willreceive a signed neighbor list fromB. It will then perform alookup in the DHT for any stored neighbor lists underB’skey. If it discovers a different list for the same period witha valid signature, then it can notifyB’s neighbors about themisbehavior, causing them to immediately blacklistB.

One challenge is that the Whanau lookups are not anony-mous and may reveal to external observers the fact thatC

is performing a random walk throughB. This informationleak, linking C and B, can then be used to breakC ’sanonymity [36], [60]. To address this problem, we introducethe concept oftesting random walks that are not actuallyused for anonymous communication but are otherwise indis-

tinguishable from regular random walks. Whanau lookupsto check for misbehavior are performed for testing randomwalks only, since information leaks in that case will notreveal private information. The lookups are performed afterthe random walk to ensure that testing walks and the regularwalks cannot be distinguished. If each node performs a smallnumber of testing walks within a each time period, anymisbehavior will be detected with high probability.

Blacklisting: WhenC detects a conflicting neighbor listissued byB, it immediately notifies all ofB’s neighbors(as listed in the neighbor list stored in the DHT), presentingthe two lists as evidence of misbehavior.B’s neighbors willthereafter terminate their social relationships withB, black-listing it. Note, however, that the two conflicting lists formincontrovertible evidence thatB was behaving maliciously,since honest nodesnever issue two neighbor lists in a singletime interval. This evidence can be broadcast globally toensure thatall nodes blacklistB, as any node can verifythe signatures on the two lists, and thusB will not be ableto form connections with any honest nodes in the system.Moreover, honest nodes will know not to selectB in anyrandom walk, effectively removing it from the social graphentirely.

Proactive vs. reactive security: Our system relies ondetecting malicious behavior and blacklisting nodes. Thus,as described so far, Pisces provides reactive security. Tofurther strengthen random walk security in the scenario whenthe adversary is performing route capture for the first time,we propose an extension to Pisces that aims to provideproactive security. We propose adiscover but wait strategy, inwhich users build circuits for anonymous communication, butimpose a delay between building a circuit and actually usingit for anonymous communication. If misbehavior is detectedby a testing random walk within the delay period, the circuitwill be terminated asB’s neighbors blacklist it; otherwise, ifa circuit survives some timeout duration, then it can be usedfor anonymous communication.

Performance optimization: Using all hops of a randomwalk for anonymous communication has significant perfor-mance limitations. First, the latency experienced by the userscales linearly with the random walk length. Second, longcircuit lengths reduce the overall throughput that a systemcan offer to a user. Inspired by prior work [37], we proposethe following performance optimization. Instead of using allhops of a random walk for anonymous communication, theinitiator can use the random walk as a peer discovery process,and leverage thekth hop and the last hop to build a two-hop circuit for anonymous communication. In our evaluation,we find that values ofk that are close to half the randomwalk length provide a good trade-off between anonymity andperformance.

IV. EVALUATION

In this section, we evaluate Pisces with theoretical anal-ysis as well as experiments using real-world social networktopologies. In particular, we (a) show the security benefitsprovided by the reciprocal neighbor policy, (b) evaluate thesecurity, performance, and overhead of our protocol thatimplements the policy, and (c) evaluate the overall anony-mity provided by Pisces. We consider four datasets for ourexperiments, which were processed in a manner similar tothe evaluation done in SybilLimit [63] and SybilInfer [11]:(i) a Facebook friendship graph from the New Orleansregional network [59], containing 50,150 nodes and 772,843edges; (ii) a Facebook wall post interaction graph fromthe New Orleans regional network [59], containing 29,140users and 161,969 edges; (iii)a Facebook interaction graphfrom a moderate-sized regional network [62], containingabout 380,564 nodes and about 3.24 million edges; (iv)aFacebook friendship graph from a moderate-sized regionalnetwork [62], containing 1,033,805 nodes and about 13.7million edges.

A. Reciprocal Neighbor Policy

To demonstrate the effectiveness of the reciprocal neighborpolicy for implementing trust-based anonymity, let us assumefor now that there is a mechanism to securely achieve thepolicy, i.e., that if a nodeX does not advertise a nodeY inits neighborlist, thenY also excludesX . In this scenario, weare interested in characterizing the probability distribution ofrandom walks.

H M

P

P

MH

HM

M MHH P

P

Fig. 3. Attack Model.

Theorem 1: Node degree attack: Given h honest nodesand m malicious nodes (including Sybil nodes) that haveg edges (attack edges) amongst each other in an undirectedand connected social network, the stationary probability ofrandom walks starting at an honest node and terminating ata malicious node cannot be biased by adding edges amongstmalicious nodes. Moreover, this stationary probability isindependent of the topology created amongst malicious nodes(as long as the social graph is connected).

Proof: Let us denoteπi as the stationary probabilityof random walks (independent of the initial state of therandom walk) for nodei, and letPij denote the transitionprobability from node i to node j. Letn denote the totalnumber of nodes in the social network (n = h +m). Sincethe transition probabilities between nodes in the Metropolis-Hastings random walks are symmetric (Pij = Pji =

min(

1degree(i) ,

1degree(j)

)

), observe that∀z, πz = 1n

is solu-tion to the equationπi ·Pij = πj ·Pji. Since social networksare non-bipartite as well as undirected graphs, the solutionto the above equation (π = 1

n) must be the unique stationary

distribution for the random walk [2]. Thus the stationaryprobability of random walks terminating at any node in thesystem is uniform and independent of the number of edgesamongst malicious nodes, or the topology amongst maliciousnodes in the system (as long as the graph remains connected).

We validate Theorem 1 using simulation results on theFacebook wall post interaction graph. Figure 1(a) depictsthe probability of a Pisces random walk terminating at amalicious node as a function of random walk length forg = 30000 (2900 malicious nodes) andg = 60000 (7300malicious nodes). We can see that the random walk quicklyreaches its stationary distribution, and at the stationary distri-bution, the probability of a random walk terminating at oneof the malicious nodes is 0.1 and 0.25 respectively (whichis the adversary’s fair share). Figure 1(b) and (c) depictthe probability of a random walk terminating at one of themalicious nodes under the node degree attack, forg = 30000and g = 60000 respectively. We can see that adding edgesamongst malicious nodes does not help the adversary (evenfor transient length random walks).

Theorem 2: Global blacklisting: suppose thatx ≤ m

malicious nodes sacrificey1 ≤ g attack edges, and thatthese malicious nodes originally hady2 ≤ g attack edges.The stationary probability of random walk terminating atmalicious nodes gets reduced proportional tox. The transientdistribution of random walks terminating at malicious nodesis reduced as a function ofy2.

Proof: If x malicious nodes perform the route captureattack and are globally blacklisted, these nodes becomedisconnected from the social trust graph. It follows from ouranalysis of Theorem 1 that the stationary distribution of therandom walk is uniform for allconnected nodes in the graph.Thus, the stationary distribution of random walks terminatingat malicious nodes gets reduced fromm

m+hto m−x

m−x+h.

To characterize the transient distribution of the randomwalk, we model the process as a Markov chain. Let us denotethe honest set of nodes byH , and the set of malicious nodesby M . The probability of anl hop random walk ending inthe malicious region (P (l)) is given by:

P (l) = P (l − 1) · PMM + (1− P (l − 1)) · PHM (2)

The terminating condition for the recursion isP (0) = 1,which reflects that the initiator is honest. We can estimate theprobabilitiesPHM andPMH as the forward and backwardconductance [26] between the honest and the maliciousnodes, denoted byφF and φB respectively. Thus we havethat:

(a) (b) (c)Fig. 1. Probability of the l’th hop being compromised (Sampling Bias), under an increasing node degree attack [Facebook wall post graph] (a) Withoutattack (b) g=30000 attack edges, (c) g=60000 attack edges. For short random walks, this is a losing strategy for the adversary. For longer random walks,the adversary does not gain any advantage.

(a) (b) (c)Fig. 2. Probability of l’th hop being compromised (Sampling Bias) under route capture attack with global blacklisting [Facebook wall post graph] (a)l = 1, (b) l = 5, (c) l = 25 . As more edges to the honest nodes are removed, the attacker’s loss is higher.

P (l) = P (l − 1) · (1 − φB) + (1− P (l − 1)) · φF

= P (l − 1) · (1 − φB − φF ) + φF (3)

P (l) = φF · [1 + (1− φB − φF ) + (1 − φB − φF )2

. . .+ (1− φB − φF )l−1] (4)

We note that if an adversary connects a chain of Sybils(say of degree 2) to an attack edge, a random walk startingfrom an honest node and traversing the attack edge to enterthe malicious region has a non-trivial probability of comingback to the honest region - via the attack edge (Pisces allowsbackward transition along edges). Our analysis models theprobability of returning to the honest region using the notionof backward conductance.

With g edges between honest and malicious nodes, we canestimate the forward conductanceφF as follows:

φF =Σx∈HΣy∈Mπx · Pxy

πH

=Σx∈HΣy∈M · Pxy

|H |= O

( g

h

)

(5)

Similarly, with g edges between honest and maliciousnodes, the backward conductanceφB is estimated as:

φB =φF · |H |

|M |=

O( gh) · h

m= O

( g

m

)

(6)

Thus, we have thatφF = O( gh), and φB = O( g

m). If

malicious nodes excludey edges to honest nodes from theirfingertables, application of the RNP ensures that the honestnodes also exclude they edges from their fingertables (localblacklisting). Thus, route capture attacks result in deletingof attack edges which reduces both forward and backwardtransition probabilities. Observe that the probability of thefirst hop being in the malicious region is equal toφF ,which gets reduced under attack. We will now show thisfor a general value ofl. Following Equation 4 and usingΣi=m

i=0 xi = 1−xm+1

1−xfor 0 < x < 1, we have that:

P (l) =φF · (1− (1− φB − φF )

l)

1− (1 − φB − φF )

=φF

φF + φB

· (1− (1− φB + φF )l) (7)

UsingφB = hm

· φF , we have that:

P (l) =m

n· (1 − (1− φB + φF )

l)

P (l) =m

n·

(

1−(

1−n

m· φF

)l)

(8)

DifferentiatingP (l) with respect toφF , we have that:

d

dφF

(P (l)) =m

n·

(

l ·(

1−n

m· φF

)l−1)

(9)

Note that(1− nmφF ) = (1− φB − φF ) ≥ 0. This implies

ddφF

P (l) ≥ 0. Thus,P (l) is an increasing function ofφF ,

Fig. 4. Probability of end-to-end timing analysis under route capture attackwith global blacklisting [Facebook wall graph]

and since the reduction of the number of attack edges reducesφF , it also leads to a reduction in the transient distributionof the random walk terminating at malicious nodes. Thus, itfollows that a reduction in the number of remaining attackedgesy2 reduces the transient distribution of random walksterminating at malicious nodes.

Figure 2 depicts the probability of random walks ter-minating at malicious nodes as a function of number ofattack edges as well as the fraction of deleted edges whenhonest nodes use a global blacklisting policy. We can see thatsacrificing attack edges so as to perform route capture attacksis a losing strategy for the attacker. Moreover, the decreaseis similar for all random walk lengths; this is because eventhe stationary distribution of the random walk terminating atmalicious nodes is reduced.

Anonymity Implication: To de-anonymize the user with-out the help of the destination node (e.g. the website to whichthe user connects anonymously), both the first hop and thelast hop of the random walk need to be malicious to observethe connecting user and her destinations, respectively. End-to-end timing analysis [21], [61] makes it so that controllingthese two nodes is sufficient for de-anonymization. Figure 4depicts the probability of such an attack being successful asa function of the number of attack edges and the fraction ofdeleted edges using the global blacklisting policy. We cansee that the probability of attack is a decreasing function ofthe fraction of deleted edges.Thus we conclude that routecapture attacks are a losing strategy against our approach.

So far, we validated our analysis using simulations assum-ing an an ideal Sybil defense. We also validated our analysisusing a more realistic Sybil defense that permits a boundednumber (set to 10 [63]) of Sybils per attack edge, which weshow in Figure 5.

B. Securing Reciprocal Neighborhood Policy

We now discuss the security and performance of ourprotocol that implements the reciprocal neighbor policy.

Security proof sketch: Suppose that a malicious nodeAaims to exclude an honest nodeB from its neighborlist.To pass nodeB’s local integrity checks, nodeA has toreturn a neighborlist to nodeB that correctly advertisesnodeB. Since random walks for anonymous communicationare indistinguishable from testing random walks, there is

Fig. 5. Probability of end-to-end timing analysis under route capture attackwith global blacklisting using 10 Sybils per attack edge [Facebook wallgraph]

Fig. 6. Probability of detecting a route capture [Facebook wall postinteraction graph]. The attack model includes 10 Sybils per attack edge.

a probability that the adversary will advertise a conflictingneighbor list that does not include nodeB to an initiator ofthe testing random walk. The initiator of the testing randomwalk will insert the malicious neighbor list into the WhanauDHT, and nodeB can perform a robust lookup for nodeA’skey to obtain the conflicting neighbor list. Since Whanau onlyprovides availability, nodeB can check for integrity of theresults by verifying nodeA’s signature. Since honest nodesnever advertise two conflicting lists within a time interval,nodeB can infer that nodeA is malicious.

Performance Evaluation: We analyze the number of test-ing random walks that each node must perform to achieve ahigh probability of detecting a malicious node that attempts toperform a route capture attack. Nodes must perform enoughtesting walks such that a high percentage of compromisednodes (which are connected to honest nodes) have beenprobed in a single time slot. First, we consider a defensestrategy where honest nodes only insert the terminal hop ofthe testing random walks in Whanau (Strategy 1). Intuitively,from the coupon collectors problem,logn walks per nodeshould suffice to catch a malicious node with high probability.Indeed, from Figure 6, we can see that six testing walks pertime interval suffice to catch a malicious node performingroute capture attacks with high probability. The honest nodescan also utilize all hops of the testing random walks to checkfor conflicts (Strategy 2), in which case only two or threetesting walks are required per time interval (at the cost ofincreased communication overhead for the DHT operations).

Next, we address the question of how to choose theduration of the time interval (t). The duration of the time slot

Fig. 7. Unreliability in circuit construction [Facebook wall post interactiongraph].

Fig. 8. Circuit build times in Tor as a function of circuit length

governs the trade-off between communication overhead andreliability of circuit construction. A large value of the timeslot interval results in a smaller communication overheadbut higher unreliability in circuit construction, since nodesselected in the random walk are less likely to still be online.On the other hand, a smaller value of the time intervalprovides higher reliability in higher circuit construction atthe cost of increased communication overhead, since a fixednumber of testing walks must be performed within theduration of the time slot. We can see this trade-off in Figure 7.We consider two churn models for our analysis: (a) nodeshave a mean lifetime of 24 hours (reflecting behavior of Torrelays [27] ), and (b) nodes have a mean lifetime of 1 hour(reflecting behavior of conventional P2P networks). For thetwo scenarios, using a time slot duration of 3 hours and oneof 5 minutes, respectively, results in a 2-3% probability ofgetting an unreliable random walk for up to 25 hops.

Overhead: There are three main sources of communica-tion overhead in our system. First is the overhead due tosetting up the neighbor lists; which requires aboutd2 KBof communication, whered is the node degree. The secondsource of overhead is the testing random walks, where nodesare required to perform about six such walks of length 25.The third source of overhead comes from participation in theWhanau DHT. Typically, key churn is a significant sourceof overhead in Whanau, requiring all of its routing tablesto be rebuilt. However, in our scenario, only the valuescorresponding to the keys change quickly, but not the keysthemselves, requiring only a modest amount of heartbeattraffic [28]. Considering the Facebook wall post topology,we estimate the mean communication overhead per nodepertime interval to be only about 6 MB. We also evaluate the

latency of constructing long onion routing circuits throughexperiments over the live Tor network. We used the Torflowutility to build Tor circuits with varying circuit lengths;Figure 8 depicts our experimental results. Using these results,we estimate that 25 hop circuits would take about 1 minuteto construct.

C. Anonymity

Earlier, we considered the probability of end-to-end timinganalysis as our metric for anonymity. This metric considersthe scenario where the adversary has exactly de-anonymizedthe user. However, in random walk based anonymous com-munication, the adversary may sometimes have probabilisticknowledge of the initiator. To quantify all possible sourcesof information leaks, we now use the entropy metric toquantify anonymity [12], [49]. The entropy metric considerstheprobability distribution of nodes being possible initiators,as computed by the attackers. In this paper, we will restrictour analysis to Shannon entropy, since it is the most widelyused mechanism for analyzing anonymity. There are otherways of computing entropy, such as guessing entropy [31]and min entropy [50], which we will consider in the fullversion of this work. Shannon entropy is computed as:

H(I) =

i=n∑

i=0

−pi · log2(pi) (10)

wherepi is the probability assigned to nodei of being theinitiator of a circuit. Given a particular observationo, theadversary can first compute the probability distribution ofnodes being potential initiators of circuitspi|o and then thecorresponding conditional entropyH(I|o). We can model theentropy distribution of the system as a whole by consideringthe weighted average of entropy for each possible observa-tion, including the null observation.

H(I|O) =∑

o∈O

P (o) ·H(I|o) (11)

We first consider the scenario where an initiator performsan l-hop random walk to communicate with a maliciousdestination, and the nodes in the random walk are all honest,i.e., the adversary is external to the system. For this scenario,we will analyze the expected initiator anonymity under theconservative assumption that the adversary has completeknowledge of the entire social network graph.

Malicious destination: A naive way to compute initiatorentropy for this scenario is to consider the set of nodesthat are reachable from the terminus of the random walkin exactly l hops (the adversary’s observation), and assigna uniform probability to all nodes in that set of beingpotential initiators. However, such an approach does notconsider the mixing characteristics of the random walk;l

hop random walks starting at different nodes may in facthave heterogeneous probabilities of terminating at a givennode.

(a) (b) (c)

(d) (e)Fig. 9. Expected entropy as a function of random walk length using (a) Facebook wall post graph (b) Facebook link graph (c) Anonymous Interactiongraph (d) Anonymous link graph and (e) CDF of entropy for Facebook wall graph and malicious destination.

We now outline a strategy that explicitly considers themixing characteristics of the trust topology. Let the terminusof anl hop random walk be nodej. The goal of the adversaryis to compute probabilitiespi of a particular node being theinitiator of the circuit:

pi =P lij

∑

x Plxj

(12)

whereP l denotes thel-hop transition matrix for the randomwalk process. Note that even for moderate size social graphs,the explicit computation ofP l is infeasible in terms ofboth memory and computational constraints. This is becauseeven thoughP is a sparse matrix, iterative multiplicationof P by itself results in a matrix that is no longer sparse.To make the problem feasible, we propose to leverage thetime reversibility of our random walk process. We havepreviously modeled the random walk process as a Markovchain (Theorem 2 in Appendix). Markov chains that satisfythe following property are known as time-reversible Markovchains [2].

πi · Pij = πj · Pji (13)

Both the conventional random walk and the Metropolis-Hastings random walk satisfy the above property and arethus time-reversible Markov chains. It follows from timereversibility [2], that:

πi · Plij = πj · P

lji =⇒ P l

ij =πj

πi

· P lji (14)

Thus it is possible to computeP lij using P l

ji. Let Vj bethe initial probability vector starting at nodej. Then theprobability of anl hop random walk starting at nodej andending at nodei can be computed as the i’th element of the

vectorVj ·Pl. Observe thatVj ·P

l can be computed withoutcomputing the matrixP l:

Vj · Pl = (Vj · P ) · P l−1 (15)

Since P is a sparse matrix,Vj · P can be computed inO(n) time, andVj · P l can be computed inO(nl) time.Finally, we can compute the probabilities of nodes beingpotential initiators of circuits using equation (12), and thecorresponding entropy gives us the initiator anonymity.Weaverage the resulting entropy over 100 randomly chosenterminal nodesj to compute the expected anonymity.

Figure 9(a)-(d) depicts the expected initiator anonymityas a function of random walk length for different socialnetwork topologies. We can see that longer random walksresult in an increase in anonymity. This is because for shortrandom walks of lengthl in restricted topologies such astrust networks, not every node can reach the terminus ofthe random walk inl hops. Secondly, even for nodes thatcan reach the terminus of the random walk inl hops, theprobabilities of such a scenario happening can be highlyheterogeneous. Further more, we can see that conventionalrandom walks converge to optimal entropy in about 10 hopsfor all four topologies. In contrast, the Metropolis-Hastingsrandom walks used in Pisces take longer to converge. Thisis because random walks in Pisces have slower mixingproperties than conventional random walks. However, we cansee that even the Metropolis-Hastings random walk starts toconverge after 25 hops in all scenarios.

To get an understanding of the distribution of the entropy,we plot the CDF of entropy over 100 random walk samples inFigure 9(e). We can see that the typical anonymity offered bymoderately long random walks is high. For example, using a

Metropolis-Hastings random walk of length25, 95% of usersget an entropy of at least11 bits. So far, we observed thatMetropolis-Hastings random walks need to be longer thanconventional random walks for equivalent level of anonymityagainst a malicious destination. Next, we will see the benefitof using Metropolis-Hastings random walks in Pisces,sincethey can be secured against insider attacks.

Insider adversary: We now analyze the anonymity ofthe system with respect to an insider adversary (maliciousparticipants).We first consider an adversary that hasg attackedges going to honest nodes, withg = O( h

log h), and 10

Sybils per attack edge [63]. When both the first and thelast hop of a random walk are compromised, then initiatorentropy is 0 due to end-to-end timing analysis. LetMi bethe event where the first compromised node is at theithhop and the last hop is also compromised. Suppose thatthe previous hop of the first compromised node is nodeA.Under this scenario, the adversary can localize the initiatorto the set of nodes that can reach the nodeA in i− 1 hops.If we denote the initiator anonymity under this scenario asH(I|Mi), then from equation (11), it follows that the overallsystem anonymity is:

H(I|O) =

i=l∑

i=1

P (Mi) ·H(I|Mi) + (1−

i=l∑

i=1

P (Mi)) · log2 n

(16)

We computeP (Mi) using simulations, andH(I|Mi), us-ing the expected anonymity computations discussed above.Figure 10(a) depicts the expected entropy as a function ofthe number of attack edges. We find that Pisces providesclose to optimal anonymity. Moreover, as the length of therandom walk increases, the anonymity does not degrade. Incontrast, without any defense, the anonymity decreases withan increase in the random walk length (not shown in thefigure), since at every step in the random walk, there is achance of the random walk being captured by the adversary.At g = 3000, the anonymity provided by a conventional 10-hop random walk without any defenses (used in systems suchas Drac and Whanau) is14.1 bits, while Pisces provides closeto optimal anonymity at14.76 bit. For uniform probabilitydistributions, this represents an increase in the size of theanonymity set by a factor of 1.6. It is also interesting to seethat the advantage of using Pisces increases as the number ofattack edge increases. To further investigate this, we considerthe attack model with perfect Sybil defense and vary thenumber of attack edges. Figure 10(b) depicts the anonymityas a function of the number of attack edges. We can seethat at 60 000 attack edges, the expected anonymity withoutdefenses is7.5 bits, as compared to more than13 bits withPisces (anonymity set size increases by a factor of 45).

Comparison with ShadowWalker: ShadowWalker [37] isa state-of-the-art approach for scalable anonymous commu-nication that organizes nodes into a structured topology such

as Chord and performs secure random walks on such topolo-gies. We now compare our approach with ShadowWalker.To compute the anonymity provided by ShadowWalker, weuse the fractionf of malicious nodes in the system as aninput to the analytic model of ShadowWalker [37], and useShadowWalker parameters that provide maximum security.Figure 11(a) depicts the comparative results between Pisces(using l = 25) and ShadowWalker. We can see that Piscessignificantly outperforms ShadowWalker. Atg = 1000 attackedges, Pisces provides about two bits higher entropy thanShadowWalker, and this difference increases to six bits atg = 3000 attack edges1. This difference arises becausePisces directly performs random walks on the social net-work topology, limiting the impact of Sybil attackers, whileShadowWalker is designed to secure random walks only onstructured topologies. Arranging nodes in a structured topol-ogy loses information about trust relationships between users,resulting in poor anonymity for ShadowWalker.2 For com-parison, we also consider the attack model with perfect Sybildefense and vary the number of attack edges. Figure 11(b)depicts the results for this scenario. We can see that evenin this scenario where trust relationships lose meaning sincethe adversary is randomly distributed, Pisces continues toprovides comparable anonymity to ShadowWalker. Pisces’sentropy is slightly lower since social networks are slowermixing than structured networks, requiring longer lengthrandom walks than ShadowWalker and thereby giving moreobservation points to the adversary.

Performance optimization: We now analyze the anony-mity provided by our two hop optimization, which uses thek-th hop and the last hop of the random walk for anonymouscommunication. To analyze anonymity in this scenario, let usredefineMi (i 6= k) as the event when the first compromisednode is at thei-th hop, the last node is also compromised, butthek-th node is honest. LetMk be the event where thek-thhop and the last hop are compromised (regardless of whetherother nodes are compromised or not) and the definition ofMl remains the same as before, i.e., only the last hop iscompromised. We can compute system anonymity as:

H(I|O) =

i=k−1∑

i=1

P (Mi) ·H(I|Mi) +

l∑

i=k+1

P (Mi) ·H(I|Mk)

+ (1 −

i=l∑

i=1

P (Mi)) · log2 n (17)

Figure 12 depicts the anonymity for our two hop optimiza-tion for different choices ofk. We see an interesting trade-offhere. Small values ofk are not optimal, since even thoughthe first hop is more likely to be honest, when the last hopis compromised, then the initiator is easily localized. On the

1At such high attack edges, ShadowWalker may even have difficulty insecurely maintaining its topology, which could further lower anonymity.

2This observation is also applicable to Tor. Pisces provides 5 bits higherentropy than Tor atg = 3000 attack edges.

(a) (b)Fig. 10. Entropy as a function of fraction of attack edges using (a) realistic model of an imperfect Sybil defense (10 Sybils per attack edge) and (b)perfect Sybil defense for Facebook wall post interaction graph. Note that the ”No Defense” strategy models the random walks used in systems such asDrac and Whanau.

(a) (b)Fig. 11. Comparison with ShadowWalker. Entropy as a function of fraction of attack edges using (a) realistic model of an imperfect Sybil defense (10Sybils per attack edge) and (b) perfect Sybil defense for Facebook wall post interaction graph.

Fig. 12. Anonymity using the two hop performance optimization, Facebookwall graph, 10 Sybils/attack edge.k = 12 results in provides a good trade-off between anonymity and performance.

other hand, large values ofk are also not optimal, since thesenodes are far away from the initiator in the trust graph andare less trusted. We find that optimal trade-off points are inthe middle, withk = 12 providing the best anonymity for ouroptimization. We also note that the anonymity provided byour two hop optimization is close to the anonymity providedby using all 25 hops of the random walk.

Selective denial of service: Next, we evaluate Piscesanonymity against the selective DoS attack [5]. In this attack,an adversary can cause a circuit to selectively fail wheneverhe or she is unable to learn the initiator identity. This forcesthe user to construct another circuit, which results in adegradation of anonymity. We found that the degradation ininitiator anonymity under this attack is less than 1%. Thereason why Pisces is less vulnerable to selective DoS ascompared with other systems such as Tor is due to the useof social trust. With high probability, most random walkstraverse only the honest set of nodes. This result is illustrated

Fig. 13. Anonymity under the Selective DoS attack using the Facebookwall graph, 10 Sybils/attack edge. Selective DoS has limited impact.

in Figure 13.Multiple communication rounds: So far, we had lim-

ited our analysis to a single communication round. Next,we analyze system anonymity over multiple communicationrounds. Let us suppose that in communication rounds1 . . . z,the adversary’s observations areO1 . . . Oz . Let us denotea given node’s probability of being the initiator afterzcommunication rounds byP (I = i|O1, . . . , Oz). Now, aftercommunication roundz + 1, we are interested in computingthe probability P (I = i|O1, . . . , Oz+1). Using Bayes’stheorem, we have that:

P (I = i|O1, . . . , Oz+1) =P (O1, . . . , Oz+1|I = i) · P (I = i)

P (O1, . . . , Oz+1)(18)

The key advantage of this formulation is that we can nowleverage the observationsO1, . . . Oz+1 being independent

Fig. 14. Anonymity degradation over multiple communicationrounds,Facebook wall graph, 10 Sybils/attack edge

given a choice of initiator. Thus we have that:

P (I = i|O1, . . . , Oz+1) =Πj=z+1

j=1 P (Oj |I = i) · P (I = i)

P (O1, . . . , Oz+1)

=Πj=z+1

j=1 P (Oj |I = i) · P (I = i)∑p=h

p=1 P (O1, . . . , Oz+1|I = p) · P (I = p)

=Πj=z+1

j=1 P (Oj |I = i) · P (I = i)∑p=h

p=1 Πj=z+1j=1 P (Oj |I = p) · P (I = p)

(19)

Finally, assuming a uniform prior over all possible initiators,we have that:

P (I = i|O1, . . . , Oz+1) =Πj=z+1

j=1 P (Oj |I = i)∑p=h

p=1 Πj=z+1j=1 P (Oj |I = p)

(20)

Figure 14 depicts the expected anonymity as a functionof number of communication rounds. We can see that theentropy provided by Pisces outperforms conventional randomwalks by more than a factor of two (in bits) after 100communication rounds (the anonymity set size is increasedby a factor of 16).

V. L IMITATIONS AND FUTURE WORK

While Pisces is the first decentralized design that canboth scalably leverage social network trust relationships andmitigate route capture attacks, its architecture has somelimitations. First, Pisces requires user’s social contacts toparticipate in the system. To improve the usability of thesystem, in future work, we will investigate the feasibilityof leveraging a user’s two-hop social neighborhood in therandom walk process. Pisces also does not preserve theprivacy of users’ social contacts. We emphasize that thisis also a limitation shared by a large class of social net-work based systems, including Sybil defense mechanismssuch as SybilLimit, SybilInfer and Whanau. Any distributedanonymity system relying on such Sybil defense mechanismscannot preserve the privacy of users’ social contacts.

Second, users who are not well connected in the socialnetwork topology may not benefit from using Pisces. This isbecause random walks starting from those nodes may take

a very long time to converge to the stationary probabilitydistribution (which provides optimal anonymity).

Third, Pisces does not defend against targeted attacks on anindividual, in which the adversary aims to massively infiltrateor compromise the user’s social circle for increasing theprobability of circuit compromise. We note that the impactof such an attack is localized to the targeted individual.

Fourth, circuit establishment in Pisces has higher latencythan existing systems, since random walks in Pisces tend tobe longer. However, we note that circuits can be establishedpre-emptively, such that this latency does not affect the user.In fact, deployed systems such as Tor already build circuitspre-emptively.

Finally, Pisces currently does not support important con-straints such as bandwidth-based load balancing and exitpolicies. The focus of our architecture was to secure the peerdiscovery process in unstructured social network topologies,and we will consider the incorporation of these constraintsin future work.

VI. CONCLUSION

In this paper, we propose a mechanism for decentral-ized anonymous communication that can securely leveragea user’s trust relationships against a Byzantine adversary.Our key contribution is to show that appearance of nodesin each other’s neighbor lists can be made reciprocal in asecure and efficient manner. Using theoretical analysis andexperiments on real world social network topologies, wedemonstrate that Pisces substantially reduces the probabilityof active attacks on circuit constructions. We find that Piscessignificantly outperforms approaches that do not leveragetrust relationships, and provides up to six bits higher entropythan ShadowWalker (5 bits higher entropy than Tor) in asingle communication round. Also, compared with the naivestrategy of using conventional random walks over socialnetworks (as in the Drac system), Pisces provides twice thenumber of bits of entropy over 100 communication rounds.In conclusion, we argue that the incorporation of social trustwill likely be an important consideration in the design of thenext generation of deployed anonymity systems.

ACKNOWLEDGMENT

We would like to thank the attendees of HotPETs 2010 andUSENIX HotSec 2010 for helpful comments. In particular,this work benefited from conversations with George Danezis,Aaron Johnson, and Paul Syverson. This work is sponsoredin part by NSF CAREER Award, number CNS-0954133, andby award number CNS-1117866. Any opinions, findings andconclusions or recommendations expressed in this materialare those of the author(s) and do not necessarily reflect thoseof the National Science Foundation.

REFERENCES

[1] Tor blog. https://blog.torproject.org/blog/tor-project-infrastructure-updates.

[2] D. Aldous and J. A. Fill. Reversible Markov chains and random walkson graphs. http://www.stat.berkeley.edu/∼aldous/RWG/book.html.

[3] O. Berthold, H. Federrath, and S. Kopsell. Web mixes: a systemfor anonymous and unobservable internet access. InInternationalworkshop on Designing privacy enhancing technologies: design issuesin anonymity and unobservability, 2001.

[4] L. Bilge, T. Strufe, D. Balzarotti, and E. Kirda. All your contacts arebelong to us: automated identity theft attacks on social networks. InWWW, 2009.

[5] N. Borisov, G. Danezis, P. Mittal, and P. Tabriz. Denial of service ordenial of security?ACM CCS, 2007.

[6] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu. Thesocialbot network: when bots socialize for fame and money. InACSAC,2011.

[7] P. Boucher, A. Shostack, and I. Goldberg. Freedom systems 2.0architecture. White paper, Zero Knowledge Systems, Inc., 2000.

[8] J. Boyan. The anonymizer: Protecting user privacy on the web.Computer-Mediated Communication Magazine, 4(9), 1997.

[9] G. Danezis, C. Diaz, C. Troncoso, and B. Laurie. Drac: an architecturefor anonymous low-volume communications. InPETS, 2010.

[10] G. Danezis, C. Lesniewski-Laas, M. F. Kaashoek, and R. Anderson.Sybil-resistant DHT routing. InESORICS, 2005.

[11] G. Danezis and P. Mittal. Sybilinfer: Detecting Sybil nodes usingsocial networks. InNDSS, 2009.

[12] C. Diaz, S. Seys, J. Claessens, and B. Preneel. Towards measuringanonymity. InPETS, 2002.

[13] R. Dingledine, N. Mathewson, and P. Syverson. Tor: The second-generation onion router. InUSENIX Security, 2004.

[14] J. Douceur. The Sybil Attack. InIPTPS, 2002.[15] H. Federrath. Project: An.on - anonymity.online: Protection of privacy

on the internet. http://anon.inf.tu-dresden.de/indexen.html.[16] M. J. Freedman and R. Morris. Tarzan: A peer-to-peer anonymizing

network layer.ACM CCS, 2002.[17] E. Gilbert and K. Karahalios. Predicting tie strength with social media.

In CHI, 2009.[18] D. Goodin. Tor at heart of embassy passwords leak.The Register,

September 10 2007.[19] W. K. Hastings. Monte carlo sampling methods using markov chains

and their applications.Biometrika, 57(1), 1970.[20] C.-Y. Hong, C.-C. Lin, and M. Caesar. Clockscalpel: Understanding

root causes of Internet clock synchronization inaccuracy. InPAM,2011.

[21] A. Houmansadr, N. Kiyavash, and N. Borisov. Rainbow: A robust andinvisible non-blind watermark for network flows. InNDSS, 2009.

[22] I2P. I2P anonymous network. http://www.i2p2.de/index.html, 2003.[23] D. Irani, M. Balduzzi, D. Balzarotti, E. Kirda, and C. Pu. Reverse

social engineering attacks in online social networks. InDIMVA, 2011.[24] A. Johnson and P. Syverson. More anonymous onion routing through

trust. In IEEE CSF, 2009.[25] A. Johnson, P. F. Syverson, R. Dingledine, and N. Mathewson.

Trust-based anonymous communication: adversary models and routingalgorithms. InACM CCS, 2011.

[26] R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad andspectral.J. ACM, 51(3), 2004.

[27] J. B. Kowalski. Tor network status. http://torstatus.blutmagie.de/.[28] C. Lesniewski-Laas and M. F. Kaashoek. Whanaungatanga: A Sybil-

proof distributed hash table. InNSDI, 2010.[29] R. Levien and A. Aiken. Attack-resistant trust metrics for public key

certification. InUSENIX Security, 1998.[30] B. N. Levine, M. Reiter, C. Wang, and M. Wright. Timing analysis in

low-latency mix systems. InFC, 2004.[31] J. L. Massey. Guessing and entropy. InIEEE ISIT, 1994.[32] J. McLachlan, A. Tran, N. Hopper, and Y. Kim. Scalable onion routing

with torsk. ACM CCS, 2009.[33] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and

E. Teller. Equation of state calculations by fast computing machines.The Journal of Chemical Physics, 21(6), 1953.

[34] D. L. Mills. Computer Network Time Synchronization: The NetworkTime Protocol. CRC Press, 2006.

[35] A. Mislove, G. Oberoi, A. Post, C. Reis, P. Druschel, and D. S.Wallach. AP3: Cooperative, decentrialized anonymous communication.ACM SIGOPS European Workshop, 2004.

[36] P. Mittal and N. Borisov. Infomation leaks in structured peer-to-peeranonymous communication systems.ACM CCS, 2008.

[37] P. Mittal and N. Borisov. Shadowwalker: Peer-to-peer anonymouscommunication using redundant structured topologies.ACM CCS,2009.

[38] P. Mittal, N. Borisov, C. Troncoso, and A. Rial. Scalable anonymouscommunication with provable security. InUSENIX HotSec, 2010.

[39] P. Mittal, M. Caesar, and N. Borisov. X-vine: Secure and pseudony-mous routing using social networks. InNDSS, 2012.

[40] P. Mittal, F. Olumofin, C. Troncoso, N. Borisov, and I. Goldberg. PIR-Tor: Scalable anonymous communication using private informationretrieval. InUSENIX Security, 2011.

[41] S. Nagaraja. Anonymity in the wild: mixes on unstructured networks.In PETS, 2007.

[42] A. Nambiar and M. Wright. Salsa: A structured approach to large-scaleanonymity. ACM CCS, 2006.

[43] A. Panchenko, S. Richter, and A. Rache. NISAN: Network informationservice for anonymization networks.ACM CCS, 2009.

[44] A. Pfitzmann and M. Hansen. A terminology for talking about privacyby data minimization: Anonymity, unlinkability, undetectability, unob-servability, pseudonymity, and identity management. http://dud.inf.tu-dresden.de/literatur/AnonTerminology v0.34.pdf, Aug. 2010. v0.34.

[45] M. Reiter and A. Rubin. Crowds: Anonymity for web transactions.ACM TISSEC, 1(1), June 1998.

[46] M. Rennhard and B. Plattner. Introducing MorphMix: Peer-to-peerbased anonymous internet usage with collusion detection.WPES, 2002.

[47] A. Rowstron and P. Druschel. Pastry: scalable, decentralized objectlocation and routing for large-scale peer-to-peer systems. InMID-DLEWARE, 2001.

[48] M. Schuchard, A. W. Dean, V. Heorhiadi, N. Hopper, and Y. Kim.Balancing the shadows. InWPES, 2010.

[49] A. Serjantov and G. Danezis. Towards an information theoretic metricfor anonymity. InPETS, 2002.

[50] V. Shmatikov and M.-H. Wang. Measuring relationship anonymity inmix networks. InACM WPES, 2006.

[51] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek,F. Dabek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookupprotocol for Internet applications.IEEE/ACM Trans. Netw., 11(1):17–32, 2003.

[52] P. Syverson, G. Tsudik, M. Reed, and C. Landwehr. Towards ananalysis of onion routing security.Workshop on Design Issues inAnonymity and Unobservaility, 2000.

[53] P. F. Syverson, D. M. Goldschlag, and M. G. Reed. Anonymousconnections and onion routing.Security & Privacy, IEEE, 4-7, 1997.

[54] P. Tabriz and N. Borisov. Breaking the collusion detection mechanismof MorphMix. In PETS, 2006.

[55] The Tor Project. Tor metrics portal. http://metrics.torproject.org/.[56] The Tor Project. Tor metrics portal: Users. https://metrics.torproject.

org/users.html.[57] The Tor Project. Who uses Tor. http://www.torproject.org/about/

torusers.html.en.[58] E. Vasserman, R. Jansen, J. Tyra, N. Hopper, and Y. Kim. Membership-

concealing overlay networks. InCCS, 2009.[59] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the

evolution of user interaction in Facebook. InWOSN, 2009.[60] Q. Wang, P. Mittal, and N. Borisov. In search of an anonymous and

secure lookup. InACM CCS, 2010.[61] X. Wang, S. Chen, and S. Jajodia. Tracking anonymous peer-to-peer

VoIP calls on the Internet. InACM CCS, 2005.[62] C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User

interactions in social networks and their implications. InEurosys, 2009.[63] H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao. Sybillimit: A near-

optimal social network defense against Sybil attacks. InIEEE S&P,2008.

[64] H. Yu, M. Kaminsky, P. Gibbons, and A. Flaxman. SybilGuard:Defending against Sybil attacks via social networks. InSIGCOMM,2006.

[65] W. Yu, X. Fu, S. Graham, D. Xuan, and W. Zhao. DSSS-based flowmarking technique for invisible traceback. InIEEE S&P, 2007.

Documents

Pisces: Anonymous Communication Using Social Networks · Pisces: Anonymous Communication Using Social Networks Prateek Mittal University of California, Berkeley [email protected]