Upload
candy
View
29
Download
0
Embed Size (px)
DESCRIPTION
Study on Network Size Estimation Schemes for Peer-to-Peer Networks. 2008/02/19 Hosik Cho [email protected]. Some Questions. How many people in this room? Why do you think that? How many people in this campus? Can you count them all? How many nodes in a P2P network over the world?. - PowerPoint PPT Presentation
Citation preview
Study on Network Size Estimation Schemes
for Peer-to-Peer Networks
2008/02/19Hosik Cho
Some Questions
• How many people in this room?• Why do you think that?
• How many people in this campus?• Can you count them all?
• How many nodes in a P2P network over the world?
2/18
3/21
Contents
• Peer to Peer networks• Network size estimation• Estimation methods
– Unstructured P2P– Structured P2P
• Conclusion
4/21
P2P networks
• A peer to peer overlay network connects peers in a logical manner on top of IP.
• Unstructured P2P: Gnutella, Freenet• Structured P2P: Chord, CAN, Pastry, …
• P2P applications– File sharing systems (Kazza, Gnutella)– Video over IP (CoolStreaming)– Voice over IP (Skype)
5/21
P2P networks
• Characteristics– Scalable– Self-organizing capability– Resilience to failure– Fully decentralized
• The system monitoring and obtaining global statistics become much more complex.
6/21
Network size estimation
• Network size (N)– Load balancing– Restricted broadcasting– Determining network parameters
• For unstructured P2P network, most approaches are based on broadcasting.
• For structured P2P network, the size can be directly inferred from the density of identifiers.
7/21
Related Works
• Unstructured P2P– Sample & Collide– Hops Sampling– Gossip-based aggregation
• Structured P2P– Token passing– Neighbor sampling– Finger sampling
8/21
Sample&Collide (1)
• “Birthday Paradox” – The probability of having two people in a room that have the same birthday is at least 50%, for a group of 23 peoples.
• The initiator samples nodes uniformly at random until a sample returns a node that already has been selected.
• The expected number (X) of samples is √2n• The system size is estimated to X2/2
9/21
Sample&Collide (2)
1. Initiator node set T>0
2. Send to neighbors3. Nodes picks a
random number U, and decrements T by log(U)/di
4. T>0, forwards the message
5. T<0, return its ID to the initiator (sample)
T
10/21
HopsSampling (1)
• Probabilistic polling approach• An initiator spreads messages in the network
and estimates the system size based on the replies it gets back.
• If hopCount < minHopsReporting, a response is set with prob. 1
• Else, the response is sent with prob. 1/2(hopCount-minHopsReporting)
• If minHopsReporting=2, only 25% of nodes with distance 4 will report back.
11/21
HopsSampling (2)
1. Initiator node set hopCount=0
2. Send to neighbors3. If hopCount <
minHopsReport, send response
4. Else, send response with probability depending on hopCount.
12/21
Gossip-based (1)
• Epidemic-based approach• If exactly one node of the system holds a
value 1, and all the other values are 0, the average is 1/N.
• An initiator take the value 1, and start gossiping.
• The reached nodes participate to the process by setting their value to 0.
• At each cycle, each node in the network chooses one of its neighbor and swaps its estimation parameter.
13/21
Gossip-based (2)
• Estimation (Estimation+neighbor’s_Estimation)/2
• To provide correct estimations, this algorithm needs to wait a certain number of rounds to elapse before computing the size estimation.
• This period is the required time for the gossip to propagate in the whole network and for the values to converge.
14/21
N Estimation in S-P2P
• Assumptions– IDs are uniformly distributed.– Each node knows the total number of
nodes (N) in the system.– Nodes do not leave and join frequently.
Basic approaches
15/21
Token 54 7
(a) Token passing (b) Neighbor sampling
16/21
N Estimation in S-P2P
• In actual deployed system,– Nodes join and leave frequently.– Node must estimate the time how long a
query delivered to the destination. O(logN)
– Proximity-based identifiers are adopted for efficient routing.
• AS number• geographic location
17/21
Uniformity of Identifiers
Myth Real
Estimation result (1)
18/18
Proximity ID’s
Uniformly distributed IDs
19/21
Extended approach
• Structured P2P maintains fingers, routing tables, contacts, etc.
• Estimate N more precisely using structural information.
Estimation result (2)
20/18
Proximity ID’s
Uniformly distributed IDs
21/21
Conclusion
• For unstructured P2P– Tradeoff between the quality of the
estimate and the associated overhead.– A proper algorithm should be applied
according to its objectives and applications.
• For structured P2P– Distribution of identifiers may be skewed.– Use of structural information will make the
estimation results more accurate.
22/21
References
• D. Psaltoulis, D. Kostoulas, I. Gupta, K. Birman, and A. Demers, “Practical algorithms for size estimation in large and dynamic groups,” PODC 2004.
• D. Kostoulas, D. Psaltoulis, I. Gupta, K. Birman, and A. Demers, “Decentralized schemes for size estimation in large and dynamic group,” IEEE NCA’05, 2005.
• L. Massoulie, A.-M. Kermarrec, E. Le Merrer, and A.J. Ganesh, “Peer couting and sampling in overlay networks: random walk methods,” Technical report MSR-TR-2005-156, 2005.
• G.S. Manku, M. Bawa, and P. Raghavan, “Symphony: Distributed Hashing in a Small World,” USITS 2003.