2
Private Peer-to-Peer Overlay for Real-Time Monitoring of a Deployed Internet-Scale Peer-to-Peer Overlay Qi Zhang Marco Piumatti Sandeep K. Singhal Microsoft Corporation Microsoft Corporation Microsoft Corporation [email protected] [email protected] [email protected] Abstract Peer-to-peer system decentralization complicates performance monitoring—and real-time monitoring is particularly challenging. We discuss our experience with monitoring the Peer Name Resolution Protocol (PNRP), a peer-to-peer name resolution/routing protocol and overlay deployed over the Internet. We must collect accurate performance data without adding traffic overhead. Unfortunately, no central node has sufficient monitoring information, and scalability and privacy concerns make it impractical to collect reliable real-time data from active client nodes. Our method obtains information from light- weight nodes (Weather Stations) actively deployed on the P2P network and themselves forming a private P2P overlay. We monitor the overall performance of the PNRP protocol on the Internet and produce timely reports supporting network management, security alerting, and system maintenance. 1. Introduction Peer Name Resolution Protocol (PNRP) [1] is a peer-to- peer name-address mapping protocol that is actively deployed over the Internet. Every PNRP node can publish and resolve names into the system; PNRP is fully distributed, with name resolution requests dynamically routed among peers in the system. Decentralization and large-scale in peer-to-peer systems complicate performance monitoring, and real-time monitoring is particularly hard. In this paper, we describe our approach for measuring the running status of PNRP. Because the system contains a large number of nodes, we deploy a small number of lightweight “Weather Station” nodes and create a secured P2P overlay among them. A data collection application running on these Weather Stations publishes names, periodically selects names from the registered name pool, and resolves them. Data collected from these Weather Stations supports our performance analysis. 2. Monitoring Challenge for Internet P2P PNRP is a distributed name resolution protocol that operates using a peer-to-peer overlay running over the Internet. Any node in the PNRP overlay is able to publish name records. However, unlike a traditional client-server system, every node in the peer-to-peer system participates in the lookup process [2, 3, 4, 5] by assisting with routing requests to their final destination. When deployed over the Internet, the PNRP system must withstand a dynamic and hostile network environment. Node transience, Internet routing irregularities, limited connectivity, and active security attacks are just a few forces that may affect real-world system performance. Health of an active overlay network rests is tied to its ability to support successful lookups of published names while meeting the latency and bandwidth expectations of the system designers. Network activity in the P2P system is scattered throughout the network. No central location has sufficient data about system health. It is impractical to collect real- time data directly from a large percentage of end-hosts because of scalability and privacy concerns; data from end- hosts also would be inherently inaccurate because an unknown proportion of lookups are for names that are not actively being published. 3. Monitoring Overlay To provide a scalable and flexible monitoring solution, we deploy a set of lightweight Weather Station hosts (Figure 1). These Stations actively participate in the Target P2P Overlay as would any other client. They enter and leave the overlay network anonymously and periodically change identity. However, the Weather Stations also establish a private Monitoring P2P Overlay used for coordinating their activities. In addition, these Stations collect data and regularly report it to a central database for analysis. A data collection application runs on all Weather Stations. Each Station registers some names in the Target Seventh IEEE International Conference on Peer-to-Peer Computing 0-7695-2986-0/07 $25.00 © 2007 IEEE DOI 10.1109/P2P.2007.27 233 Seventh IEEE International Conference on Peer-to-Peer Computing 0-7695-2986-0/07 $25.00 © 2007 IEEE DOI 10.1109/P2P.2007.27 235

[IEEE Seventh IEEE International Conference on Peer-to-Peer Computing (P2P 2007) - Galway, Ireland (2007.09.2-2007.09.5)] Seventh IEEE International Conference on Peer-to-Peer Computing

Embed Size (px)

Citation preview

Private Peer-to-Peer Overlay for Real-Time Monitoring of a Deployed Internet-Scale Peer-to-Peer Overlay

Qi Zhang Marco Piumatti Sandeep K. Singhal Microsoft Corporation Microsoft Corporation Microsoft Corporation [email protected] [email protected] [email protected]

Abstract

Peer-to-peer system decentralization complicates performance monitoring—and real-time monitoring is particularly challenging. We discuss our experience with monitoring the Peer Name Resolution Protocol (PNRP), a peer-to-peer name resolution/routing protocol and overlay deployed over the Internet. We must collect accurate performance data without adding traffic overhead. Unfortunately, no central node has sufficient monitoring information, and scalability and privacy concerns make it impractical to collect reliable real-time data from active client nodes. Our method obtains information from light-weight nodes (Weather Stations) actively deployed on the P2P network and themselves forming a private P2P overlay. We monitor the overall performance of the PNRP protocol on the Internet and produce timely reports supporting network management, security alerting, and system maintenance. 1. Introduction

Peer Name Resolution Protocol (PNRP) [1] is a peer-to-peer name-address mapping protocol that is actively deployed over the Internet. Every PNRP node can publish and resolve names into the system; PNRP is fully distributed, with name resolution requests dynamically routed among peers in the system.

Decentralization and large-scale in peer-to-peer systems complicate performance monitoring, and real-time monitoring is particularly hard. In this paper, we describe our approach for measuring the running status of PNRP. Because the system contains a large number of nodes, we deploy a small number of lightweight “Weather Station” nodes and create a secured P2P overlay among them. A data collection application running on these Weather Stations publishes names, periodically selects names from the registered name pool, and resolves them. Data collected from these Weather Stations supports our performance analysis.

2. Monitoring Challenge for Internet P2P PNRP is a distributed name resolution protocol that

operates using a peer-to-peer overlay running over the Internet. Any node in the PNRP overlay is able to publish name records. However, unlike a traditional client-server system, every node in the peer-to-peer system participates in the lookup process [2, 3, 4, 5] by assisting with routing requests to their final destination.

When deployed over the Internet, the PNRP system must withstand a dynamic and hostile network environment. Node transience, Internet routing irregularities, limited connectivity, and active security attacks are just a few forces that may affect real-world system performance. Health of an active overlay network rests is tied to its ability to support successful lookups of published names while meeting the latency and bandwidth expectations of the system designers.

Network activity in the P2P system is scattered throughout the network. No central location has sufficient data about system health. It is impractical to collect real-time data directly from a large percentage of end-hosts because of scalability and privacy concerns; data from end-hosts also would be inherently inaccurate because an unknown proportion of lookups are for names that are not actively being published.

3. Monitoring Overlay To provide a scalable and flexible monitoring solution,

we deploy a set of lightweight Weather Station hosts (Figure 1). These Stations actively participate in the Target P2P Overlay as would any other client. They enter and leave the overlay network anonymously and periodically change identity. However, the Weather Stations also establish a private Monitoring P2P Overlay used for coordinating their activities. In addition, these Stations collect data and regularly report it to a central database for analysis.

A data collection application runs on all Weather Stations. Each Station registers some names in the Target

Seventh IEEE International Conference on Peer-to-Peer Computing

0-7695-2986-0/07 $25.00 © 2007 IEEEDOI 10.1109/P2P.2007.27

233

Seventh IEEE International Conference on Peer-to-Peer Computing

0-7695-2986-0/07 $25.00 © 2007 IEEEDOI 10.1109/P2P.2007.27

235

Overlay and announces the names to the Monitoring Overlay. Each Station periodically picks a name from the registered name pool and issues resolve requests in the Target Overlay. Data including the resolve latency, hop count, and resolve success or failure is logged to a central database for later analysis.

With this method, we use data collected from a subset of the system to extrapolate performance of the entire system. To obtain accurate results, we distribute the Weather Stations throughout the network. We also carefully consider the number of Weather Stations. More deployed Weather Stations provide more complete results, but they also generate more data collection and analysis load. Weather Stations’ activity may also disturb the Target Overlay, potentially affecting accuracy of monitoring results. In practice, we keep the number of deployed Weather Stations small.

4. Data Analysis Using performance data collected from the Weather

Stations, we calculate several real-time metrics. These metrics are compared with simulation data generated within the laboratory to detect unusual behavior, refine protocol testing practices, and take corrective actions. Figure 2 shows some sample performance data.

Resolve Rate. The resolve rate indicates the chance of obtaining a successful name resolution result through PNRP for each resolve request. It is defined as the ratio of number of successful resolve to the number of total resolve request.

Average Resolve Latency. Name resolution should complete quickly. The resolve latency is the difference between the time that a resolve request is initiated and the time that a resolve result is received.

Estimated Network Size. The number of hosts in the system may indicate the occurrence of network partitions. We obtain the estimated network size from the number of entries in each Weather Station’s local routing table. We compare estimates across Weather Stations and over time to detect network partitions.

Resolve Efficiency. A PNRP name resolution may take several hops; some return information about a closer node, while others fail to find any node closer to the target. Hops

that are not involved in the successful route are wasted hops. We define resolve efficiency as:

⎪⎩

⎪⎨⎧

+++

resolve) (failed 0

resolve) l(successfu 1count hop wastedcount hop

1count hop

Average Hop Count. The average hop count indicates how many hops a resolve request must traverse before a result is available.

Traffic. We collect information about incoming / outgoing traffic (packet types, rates, and so on). We analyze this data over the long term to look for unusual trends or detect emergent security attacks.

5. Summary

In this paper, we discussed performance monitoring for PNRP, an Internet-scale peer-to-peer system. Our approach employs lightweight hosts that participate in the P2P Target Overlay and themselves create a private P2P Monitoring Overlay. Our approach balances complete data collection against the need to avoid artificially perturbing the running system. With the performance data, we determine system anomalies, detect partitions, and monitor security.

6. References [1] PNRP. http://www.microsoft.com/technet/network/p2p/pnrp. mspx. [2] H. Balakrishnan, et al. “Looking up data in p2p systems.” Communications of the ACM 46(2), 2003. [3] I. Stoica, et al. “Chord: A scalable Peer-To-Peer lookup service for internet applications.” SIGCOMM, 149–160, 2001. [4] A. Rowstron and P. Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer system.” IFIP/ACM Int’l Conf on Dist Systems Platforms (Middleware), 329-350, November 2001. [5] I. Clarke, et al. “Freenet: A Distributed Anonymous Information Storage and Retrieval System.” Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. Springer, New York, 2001.

Figure 1. Weather Stations

Figure 2. Performance Data

Resolve Rate

00.20.40.60.8

11.2

0 4 8 12 16 20

Time

Res

olve

Rat

e

Average Resolve Latency

0100200300400500600

0 4 8 12 16 20

Time

Ave

rage

Lat

ency

Resolve Efficiency

00.20.40.60.8

1

0 4 8 12 16 20

Time

Res

olve

Effi

cien

cy

Average Hop Count

0123456

0 3 6 9 12 15 18 21

Time

Ave

rage

Hop

Cou

nt

234236