p815-wei

Embed Size (px)

Citation preview

  • 8/3/2019 p815-wei

    1/5

    PeerMate: A Malicious Peer Detection Algorithm

    for P2P Systems based on MSPCA

    Xianglin WeiDepartment of Computer

    Science & Engineering

    PLA University of

    Science & Technology

    Nanjing, China

    Email: wei [email protected]

    Tarem AhmedDepartment of

    Computer Science

    International Islamic

    University Malaysia

    Kuala Lumpur, Malaysia

    Email: [email protected]

    Ming ChenDepartment of Computer

    Science and Engineering

    PLA University of

    Science and Technology

    Nanjing, China

    Email: [email protected]

    Al-Sakib Khan PathanDepartment of

    Computer Science

    International Islamic

    University Malaysia

    Kuala Lumpur, Malaysia

    Email: [email protected]

    1

    AbstractMany reputation management schemes have beenproposed to assist peers in choosing the most trustworthy collab-

    orators in a P2P environment where honest peers coexist withmalicious ones. While these schemes indeed generally providesome useful information regarding the reliability of peers, theystill suffer from various attacks such as slandering, collusion, etc.Consequently, being able to detect the malicious peers plays acritical role in the successful functioning of these mechanisms,and this is our focus in this paper. First, we divide the maliciouspeers into several categories. Second, we introduce PeerMate, amalicious peer detection algorithm based on Multiscale PrincipalComponent Analysis and Quality of Reconstruction, to detectmalicious peers in Reputation-based P2P systems. Finally, weexperimentally demonstrate that PeerMate is able to detectmalicious peers accurately and efficiently.

    I. INTRODUCTION

    In order to stimulate peers to contribute resources and to

    assist peers in selecting the most trustworthy collaborators,

    several reputation management schemes have been proposed

    [1], [2]. These schemes try to evaluate the transactions per-

    formed by peers and assign reputation values to them to reflect

    their past behavioral features. These reputation values will

    then be the basis for identifying trustworthy peers in reducing

    the blindness of peer selection. Although these schemes have

    been proved to be theoretically attractive, they still have a long

    way to go before practical deployment, as they are still faced

    with various attacks including self-promoting, whitewashing,

    slandering, collusion [3] and Sybil attacks [4].As a burgeoning field, malicious peer detection has attracted

    the attention of many researchers in recent years, as is dis-

    cussed in Section II. These methods, however, mostly either

    concentrate on malicious peers of some particular categories,

    or are based on global assumptions. Contrarily, we focus in

    this work on developing a general detection algorithm, which

    we have named PeerMate. The main differences between Peer-

    Mate and existing methods are as follows: first, PeerMate aims

    at detecting malicious peers of multi-categories rather than

    some particular categories; second, PeerMate is based only on

    1Tarem Ahmed is also affiliated with the Department of Electrical andElectronic Engineering, BRAC University, Dhaka, Bangladesh.

    reputation information, which can be collected in many current

    Reputation-based P2P (RP2P) systems, rather than relying onglobal information. Furthermore, the performance of PeerMate

    is independent of the underlying P2P workload model or the

    number of the types of malicious peers, as long as they have

    different behaviors with the honest peers.

    Our primary contributions in this work are threefold: first,

    we aim to detect malicious peers in RP2P systems from the

    signal processing aspect; second, we develop PeerMate to de-

    tect malicious peers based on Multiscale Principal Component

    Analysis (MSPCA) [5]; third, the performance of PeerMate is

    experimentally evaluated through simulations.

    The rest of the paper is organized as follows. Section

    II summarizes the related work. PeerMate is introduced in

    Section III. Section IV presents our experimental results. Weconclude in Section V with a discussion of the future potential

    of our work.

    I I . RELATED WOR K

    Mekouar et al. have proposed a Malicious Detector Al-

    gorithm to detect liar peers that send wrong feedback to

    subvert reputation system [6]. That is, after each transaction

    between a pair of peers, both peers are required to generate

    feedback to describe the transaction. If there is an obvious gap

    between the two pieces of feedback, both are regarded being

    suspicious. Ji et al. have proposed a group-based metric for

    protecting P2P network against Sybil attacks and collusionsby dividing the whole network into trust groups based on

    global structure information [7]. Global structure information,

    however, is difficult to obtain. Lian et al. have recommended

    various collusion detection approaches in [3], including pair-

    wise detector and traffic concentration detector, using trace

    analysis on data from the Maze file sharing application [8].

    Recently, an upload entropy scheme has been developed by

    Liu et al. to prevent collusions and further enhance robustness

    of private trackers sites [1]. The threshold of this scheme,

    however, needs to be set manually. Lee et al. have proposed a

    simplified clique detection method to detect the colluders [9],

    but their method is restricted to colluders who form a clique.

    International Conference on Computing, Networking and Communications, Network Architecture & P2P Protocol Symposium

    978-1-4673-0009-4/12/$26.00 2012 IEEE 815

  • 8/3/2019 p815-wei

    2/5

    III. PEE RMATE

    In this section we first describe the detection context, and

    outline the various categories of malicious peers that we aim to

    detect, before introducing our proposed PeerMate algorithm.

    A. Detecting Scenario

    The detecting context. Our detecting context is derived

    from the current popular RP2P systems including TVTorrents

    [10], EigenTrust [2] and Maze [8]. The exchange process is

    divided into several time slots (rounds), and the process obeys

    a typical P2P workload model, such as the classic workload

    model described by Gummadi et al. in [11] or the BitTorrent

    workload model [1]. We have found that the effectiveness of

    PeerMate is independent of the underlying workload model,

    and in this paper, we present results using the classic model

    from literature [11]. Each peer initiates requests during eachround, which follows the request model in literature. Each

    peer is assigned an initial reputation value. A peers reputation

    value will increase by Ru when the peer uploads a valid object

    and will decrease by Rd when the peer downloads a valid

    object, with Ru Rd. Details of the workload model is

    illustrated in Section IV.

    Malicious peers. According to their different behavioral

    features, the malicious peers can be divided into various

    categories, and hence it is hard to summarize all the categories

    comprehensively due to the complexity of the behaviors.

    Here, we mainly focus on the following categories to help

    us understand the effectiveness of PeerMate and to serve

    as a benchmark for evaluating the forthcoming detectionalgorithms. MP1: free-riding peers that utilize P2Ps resources

    without providing appropriate amount of resources, for exam-

    ple BitTyrant and BitThief clients; MP2: peers that upload

    inauthentic objects to persecute the community, for example

    those peers controlled by the music industry that inject fake

    files into KaZaA; MP3: peers that collaborate with each

    other to promote their reputation values through uploading

    object to each other preferentially, such as the colluders in

    Maze, and then using these reputation values to download

    their desired objects; MP4: peers that create Sybil peers for

    uploading to promote their own reputation values; MP5: peers

    that exploit resources of P2P for malicious purposes like worm

    dispatching, denial-of-service, etc [12]. We recognize that thispartition is incomplete, and there exist other categories of

    malicious peers with more complex behaviors. For example,

    some peers may belong to multiple categories at the same time,

    and their behavior is a combination of the characteristics of

    multiple categories. As another example, a peer may pretend

    to be nice until its reputation becomes high, and then begin

    to exploit the system. In summary, all these malicious peers

    behave differently from honest peers. For the sake of simplic-

    ity, these malicious peers are called strategic malicious peers,

    and we will discuss them further in Section IV. Other peers

    besides the malicious ones will be referred to as honest peers.

    Furthermore, we assume that majority of peers are honest.

    B. Reputation Matrix

    Let N be the total number of peers, and RTp be the

    reputation value of peer p at the end of round T, with

    1 p N. Consequently, the reputation value of all peers

    form a reputation vector RVT = (RT1 , RT2 , , RTN) at theend of the Tth round. Besides, from the aspect of a single

    peer p with reputation values over time Rtp, 1 t T, wecan form a reputation time-series RSp = (R

    1

    p, R2

    p, , RTp ).

    We can then form a reputation matrix RTN at the end of the

    Tth round:

    RTN =

    R11

    R12

    . . . R1NR21

    R22

    . . . R2N...

    .... . .

    ...

    RT1

    RT2

    . . . RTN

    (1)

    Thus the ith column of RTN is the reputation time-series

    of peer i, and the tth row of RTN is the reputation vectorfor all peers at the end of round t. Each entry Rtp is the

    reputation value of peer p at the end of the tth round. In RP2P

    systems, RTN may be iteratively calculated by each peer as

    in EigenTrust, or be collected and calculated by a centralized

    facility such as the tracker servers in private tracker sites or the

    central server in Maze. Distributed Hash Table (DHT) -based

    methods also provide a feasible way of collecting reputation

    information in a distributed manner. Thus it is always possible

    to calculate RTN. For the sake of simplicity of notation, we

    will use R to represent RTN from now on. Note that due

    to network congestion and churn in overlay network, a few

    elements in R may be inaccurate or missing. We discuss the

    effects of this in Subsection IV-C.

    C. Fundamental Idea

    As described in Subsection III-A, malicious peers span

    various types. Since the reputation value of a peer reflects its

    behavioral features, different behaviors will lead to different

    reputation reputation values, which will subsequently lead

    to different reputation time-series in R. The behavior of a

    malicious peer, of whatever type, is however expected to be

    different from that of an honest peer. Therefore, we will be

    able to isolate malicious peers from honest ones, if we are

    able to mine the different behavioral features embedded in

    the different deterministic features of the malicious peersreputation time-series within R.

    To extract the deterministic features of reputation time-

    series in Rwhile also coping with inaccurate or missing data,

    PeerMate begins by applying Multiscale Principal Component

    Analysis (MSPCA) [5] to R to obtain the reconstructed

    reputation matrix R. We have found that the variability in

    R may be captured in a lower-dimensional space, thereby

    justifying the application of MSPCA in this manner. We then

    define a Quality of Reconstruction QR metric as is described

    in Subsection III-E, and finally distinguish between malicious

    and honest ones using a threshold . The setting of this

    threshold is discussed in Subsection IV-D.

    816

  • 8/3/2019 p815-wei

    3/5

    5 :

    =/ /

    = + 5

    =/ /

    < * 5

    =P P

    < * 5

    =< * 5

    #

    #

    :DYHOHW

    FRHIILFLHQW

    7KUHVKROG

    ZDYHOHW

    FRHIILFLHQW

    WKUHVKROG

    /=

    /

  • 8/3/2019 p815-wei

    4/5

    TABLE ISIMULATION PARAMETERS.

    Symbol Meaning Base value

    N # of peers 200

    O # of objects 4000

    R per-user request rate 2 objects/roundO object arrival rate varies

    PMratio of # of malicious peers

    to # of total peersvaries

    Phprobability of strategic

    malicious peer acting honestvaries

    F. Complexity

    The time complexity of PeerMate is governed by that of

    MSPCA. The time complexity is evaluated to be O(TN2).The storage cost of PeerMate is O(TN).

    IV. EXPERIMENTAL RESULTS

    In this section, we conduct simulations to evaluate theperformance of PeerMate. We first describe our simulation

    context, then present the results with analyses of the sensitiv-

    ities of the various algorithm parameters.

    A. Simulation Context

    In our simulation, the basic workload model follows the

    typical workload model from literature [11]. The objects arrive

    at constant rate O > 0, and their popularity follows theZipf probability distribution. On average, a client requests a

    constant number of objects per round, from which it chooses

    objects to comply with a Zipf distribution with parameter 1.0.We assume that all objects in the system are of equal size, and

    malicious peers are selected randomly from all peers. Table Ipresents our parameters values.

    B. Comparison Benchmark and Evaluation Metrics

    We choose the two recently proposed, representative

    schemes of Upload Entropy (UEntropy) and Interaction En-

    tropy (IEntropy) [1] as our comparison benchmarks. UEntropy

    and IEntropy are both aimed at providing incentive to peers

    for sharing content in Private BT society, in which peers

    with the lowest entropy are considered the least trustworthy

    collaborators, i.e. they are the suspected malicious peers.

    Therefore, in order to guarantee the fairness of comparison,

    in UEntropy and IEntropy schemes, those peers that have the

    lowest entropy will be treated as suspicious malicious peers.

    Let MPS be the malicious peers set, HPS be the set of

    honest peers, and SMPS be suspected malicious peers set. We

    define the True Positive Ratio (TPR) and the False Negative

    Ratio (FNR) as TPR = |SMPS MPS|/|MPS| and(FNR) = |SMPS HPS|/|SMPS| respectively, where ||represents the rank of a set and denotes the intersection oftwo sets. TPR and FNR both lie in [0, 1].

    C. Without Strategic Malicious Peers

    We first compare detection performance of PeerMate with

    the chosen benchmark schemes of UEntropy and IEntropy

    with O = 2, Ph = 0, PM = 0.2, = 0.9 and default

    TABLE IIDETECTION PERFORMANCES.

    Scheme TPR FNR

    PeerMate 97.5% 7%

    UEntropy 42.5% 57.5%

    PeerMate 0% 100%

    values for the other parameters as stated in Table I. We used

    200 rounds. The results are presented in Table II. It is clearthat PeerMate exhibits the best performance, with the highest

    TPR and lowest FNR.

    After manual inspection, we found that the 2.5% maliciouspeers that could be distinguished by PeerMate were those

    belonging to type MP4, the behaviors of which are similar to

    honest peers. In order to detect these malicious peers, one can

    investigate the behavior of Sybil peers, as they only download

    content from malicious peers belonging to MP4. The 7% FNRmay occur because the objects of these honest peers may be

    at a lower rank among all objects during the request process.

    D. Threshold Setting

    Threshold plays a critical role in PeerMate and determinesthe TPR and FNR to some extent. To investigate the impact of

    , we ran PeerMate using a range of possible values for thethreshold. The other parameters remained as before: O =2, Ph = 0 and PM = 0.2. Figure 2 shows the variation inTPR and FNR with the selection of . We observe that theTPR of PeerMate increases from 55% to 97.5% and the FNRincreases from 4.5% to 11.5%, as goes from 0.84to0.93. Asthe objective of PeerMate is to detect as many malicious peers

    as possible with as low FNR as possible, we choose = 0.9as the default in our experiments.

    E. With Strategic Malicious Peers

    We study two other more complex scenarios to evaluate the

    performance of PeerMate.

    In order to avoid being detected, many strategic malicious

    peers will act as honest ones with certain probability Phduring each round. To study the effect of this on the detection

    performance of PeerMate, we simulated with a range of values

    for Ph. The other algorithm parameters remained constant attheir default values: O = 2, PM = 0.2 and = 0.9. Figure 3

    presents the results. We observe that the TPR decreases from97.5% to 65% as Ph goes from 0 to 0.8. TPR exceeds 75%as long as Ph is below the significantly high value of 0.4.The FNR remains within a generally low band, with values

    fluctuating between 3% and 11.5%. This experiment indicatesthat the performance of PeerMate is acceptable as long as Phremains below the significantly high value of 0.4.

    F. Mixture Model

    We have also evaluated PeerMate with strategic malicious

    peers which belong to multiple categories at the same time,

    and whose behaviors are a combination of the behaviors of

    multiple categories. Generally speaking, even if a malicious

    818

  • 8/3/2019 p815-wei

    5/5

    0.84 0.86 0.88 0.9 0.92 0.940.5

    0.6

    0.7

    0.8

    0.9

    1

    TPR

    0.84 0.86 0.88 0.9 0.92 0.940.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.1

    0.11

    0.12

    FNR

    Fig. 2. Variation in detection performance of PeerMate with threshold .

    peer exhibits a mixture of different types of malicious behav-

    iors, its specific behavior will still be different from that of an

    honest peer.

    We have injected mixed malicious peers which act as MP1,

    ,MP5

    with certain probabilities. We found that TPR was95% and FNR was 9% after 100 rounds. Thus PeerMate is

    also able to detect malicious peers of mixture types.

    G. Discussion

    We see that PeerMate can detect malicious types of various

    types with high success rates. Our experiments have also

    shown that missing data does not affect the detection perfor-

    mance of PeerMate, when up to 50% of the data is missing.

    In addition, we have found that PeerMate is insensitive to the

    value of PM, the ratio of the number of malicious peers to

    total peers, when PM is below 60%. We do not show these

    results here due to space constraints.

    Each reputation management scheme needs to implementsome algorithms to detect malicious peers. This may be proved

    through a game-theoretic model and will be presented in our

    future papers.

    PeerMate may be applied to Maze-like and EigenTrust-

    like systems directly since they have common context. It can

    also be applied to other Reputation-based P2P systems if

    some schemes are implemented to collect and compute the

    reputation values of all the peers.

    V. CONCLUSION AND FUTURE WOR K

    In this paper, we have presented PeerMate, a novel ma-

    licious peer detection algorithm for Reputation-based RP2P

    systems, which distinguishes malicious peers from honest onesusing Multiscale Principal Component Analysis and Quality

    of Reconstruction. Experimental results have indicated that

    PeerMate achieves high detection accuracy and flexibility, has

    low complexity, and is insensitive to parameter selection.

    As a future task, we are planning to extend PeerMate to

    incorporate adaptive, online anomaly detection algorithms that

    have recently been suggested. Examples include the KOAD

    [15], KEAD [16] and OCNM [17] algorithms.

    ACKNOWLEDGEMENTS

    This work was supported in part by the National Natural ScienceFoundation of China Grants # 61070173 and 61103225, and theJiangsu Province Natural Science Foundation Grants # BK2010133and BK2009058.

    0 0.2 0.4 0.6 0.80.65

    0.7

    0.75

    0.8

    0.85

    0.9

    0.95

    1

    Ph

    TPR

    0 0.2 0.4 0.6 0.80.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.1

    0.11

    0.12

    Ph

    FNR

    Fig. 3. Variation in detection performance of PeerMate with the value ofPh, the probability of a strategic malicious peer acting honest.

    REFERENCES

    [1] Z. Liu, P. Dhungel, D. Wu, C. Zhang, and K. Ross, Understanding andimproving ratio incentives in private communities, in Proc. IEEE Int.Conf. on Distributed Computing Systems (ICDCS), Genoa, Italy, Aug.2010.

    [2] S. Kamvar, M. Schlosser, and H. Garcia-Molina, The EigenTrustalgorithm for reputation management in P2P networks, in Proc. Int.World Wide Web Conf. (WWW), Budapest, Hungary, May 2003.

    [3] Q. Lian, Z. Zhang, M. Yang, B. Zhao, Y. Dai, and X. Li, An empiricalstudy of collusion behavior in the maze P2P file-sharing system, inProc. IEEE Int. Conf. on Distributed Computing Systems (ICDCS),Toronto, ON, Canada, Jun. 2007.

    [4] J. Douceur, The sybil attack, in Proc. Int. Workshop on Peer-to-PeerSystems, Cambridge, MA, USA, Mar. 2007.

    [5] B. Bakshi, Multiscale PCA with application to multivariate statisticalprocess monitoring, AIChE Journal, vol. 44, no. 7, pp. 15961610, Jul.1998.

    [6] L. Mekouar, Y. Iraqi, and R. Boutaba, Peer-to-Peers most wanted:Malicious peers, Computer Networks, vol. 50, no. 4, pp. 545562, Mar.2006.

    [7] W. Ji, S. Yang, and B. Chen, A group-based trust metric for P2Pnetworks: Protection against sybil attack and collusion, in Proc. IEEE

    Int. Conf. on Computer Science and Software Engineering, Wuhan,China, Dec. 2008.[8] Maze. Portal. [Online]. Available: http://maze.tianwang.com/[9] H. Lee, J. Kim, and K. Shin, Simplified clique detection for collusion-

    resistant reputation management scheme in P2P networks, in Proc. Int.Symposium on Communications and Information Technologies (ISCIT),Tokyo, Japan, Oct. 2010.

    [10] TV Torrents. Portal. [Online]. Available: http://www.tvtorrents.com/[11] K. Gummadi, R. Dunn, S. Saroiu, S. Gribble, H. Levy, and J. Zahorjan,

    Measurement, modeling, and analysis of a peer-to-peer file-sharingworkload, ACM SIGOPS Operating Systems Review (OSR), vol. 37,no. 5, pp. 314329, Dec. 2003.

    [12] K. Hoffman, D. Zage, and C. Nita-Rotaru, A survey of attack anddefense techniques for reputation systems, ACM Computing Surveys,vol. 42, no. 1, pp. 131, Dec. 2009.

    [13] D. Donoho, I. Johnstone, G. Kerkyacharian, and D. Picard, Waveletshrinkage: Asymptopia? J. of the Royal Stat. Soc., vol. 57, no. 2, pp.

    301369, 1995.[14] A. Lakhina, M. Crovella, and C. Diot, Diagnosing network-wide trafficanomalies, in Proc. ACM SIGCOMM, Portland, OR, Aug. 2004.

    [15] T. Ahmed, M. Coates, and A. Lakhina, Multivariate online anomalydetection using kernel recursive least squares, in Proc. IEEE Int. Conf.on Computer Communications (INFOCOM), Anchorage, AK, USA,May 2007.

    [16] T. Ahmed, Online anomaly detection using KDE, in Proc. IEEEGlobal Communications Conf. (GLOBECOM), Honolulu, HI, USA, Nov.2009.

    [17] T. Ahmed, X. Wei, S. Ahmed, and A.-S. Pathan, Intruder detection incamera networks using the one-class neighbor machine, in Proc. Amer-ican Telecommunications Systems Management Association (ATSMA)

    Networking and Electronic Commerce Research Conf. (NAEC), Riva delGarda, Italy, Oct 2011.

    819