Study on Network Size Estimation Schemes for Peer-to-Peer Networks

Study on Network Size Estimation Schemes

for Peer-to-Peer Networks

2008/02/19Hosik Cho

[email protected]

Some Questions

• How many people in this room?• Why do you think that?

• How many people in this campus?• Can you count them all?

• How many nodes in a P2P network over the world?

2/18

3/21

Contents

• Peer to Peer networks• Network size estimation• Estimation methods

– Unstructured P2P– Structured P2P

• Conclusion

4/21

P2P networks

• A peer to peer overlay network connects peers in a logical manner on top of IP.

• Unstructured P2P: Gnutella, Freenet• Structured P2P: Chord, CAN, Pastry, …

• P2P applications– File sharing systems (Kazza, Gnutella)– Video over IP (CoolStreaming)– Voice over IP (Skype)

5/21

P2P networks

• Characteristics– Scalable– Self-organizing capability– Resilience to failure– Fully decentralized

• The system monitoring and obtaining global statistics become much more complex.

6/21

Network size estimation

• Network size (N)– Load balancing– Restricted broadcasting– Determining network parameters

• For unstructured P2P network, most approaches are based on broadcasting.

• For structured P2P network, the size can be directly inferred from the density of identifiers.

7/21

Related Works

• Unstructured P2P– Sample & Collide– Hops Sampling– Gossip-based aggregation

• Structured P2P– Token passing– Neighbor sampling– Finger sampling

8/21

Sample&Collide (1)

• “Birthday Paradox” – The probability of having two people in a room that have the same birthday is at least 50%, for a group of 23 peoples.

• The initiator samples nodes uniformly at random until a sample returns a node that already has been selected.

• The expected number (X) of samples is √2n• The system size is estimated to X2/2

9/21

Sample&Collide (2)

1. Initiator node set T>0

2. Send to neighbors3. Nodes picks a

random number U, and decrements T by log(U)/di

4. T>0, forwards the message

5. T<0, return its ID to the initiator (sample)

T

10/21

HopsSampling (1)

• Probabilistic polling approach• An initiator spreads messages in the network

and estimates the system size based on the replies it gets back.

• If hopCount < minHopsReporting, a response is set with prob. 1

• Else, the response is sent with prob. 1/2(hopCount-minHopsReporting)

• If minHopsReporting=2, only 25% of nodes with distance 4 will report back.

11/21

HopsSampling (2)

1. Initiator node set hopCount=0

2. Send to neighbors3. If hopCount <

minHopsReport, send response

4. Else, send response with probability depending on hopCount.

12/21

Gossip-based (1)

• Epidemic-based approach• If exactly one node of the system holds a

value 1, and all the other values are 0, the average is 1/N.

• An initiator take the value 1, and start gossiping.

• The reached nodes participate to the process by setting their value to 0.

• At each cycle, each node in the network chooses one of its neighbor and swaps its estimation parameter.

13/21

Gossip-based (2)

• Estimation (Estimation+neighbor’s_Estimation)/2

• To provide correct estimations, this algorithm needs to wait a certain number of rounds to elapse before computing the size estimation.

• This period is the required time for the gossip to propagate in the whole network and for the values to converge.

14/21

N Estimation in S-P2P

• Assumptions– IDs are uniformly distributed.– Each node knows the total number of

nodes (N) in the system.– Nodes do not leave and join frequently.

Basic approaches

15/21

Token 54 7

(a) Token passing (b) Neighbor sampling

16/21

N Estimation in S-P2P

• In actual deployed system,– Nodes join and leave frequently.– Node must estimate the time how long a

query delivered to the destination. O(logN)

– Proximity-based identifiers are adopted for efficient routing.

• AS number• geographic location

17/21

Uniformity of Identifiers

Myth Real

Estimation result (1)

18/18

Proximity ID’s

Uniformly distributed IDs

19/21

Extended approach

• Structured P2P maintains fingers, routing tables, contacts, etc.

• Estimate N more precisely using structural information.

Estimation result (2)

20/18

Proximity ID’s

Uniformly distributed IDs

21/21

Conclusion

• For unstructured P2P– Tradeoff between the quality of the

estimate and the associated overhead.– A proper algorithm should be applied

according to its objectives and applications.

• For structured P2P– Distribution of identifiers may be skewed.– Use of structural information will make the

estimation results more accurate.

22/21

References

• D. Psaltoulis, D. Kostoulas, I. Gupta, K. Birman, and A. Demers, “Practical algorithms for size estimation in large and dynamic groups,” PODC 2004.

• D. Kostoulas, D. Psaltoulis, I. Gupta, K. Birman, and A. Demers, “Decentralized schemes for size estimation in large and dynamic group,” IEEE NCA’05, 2005.

• L. Massoulie, A.-M. Kermarrec, E. Le Merrer, and A.J. Ganesh, “Peer couting and sampling in overlay networks: random walk methods,” Technical report MSR-TR-2005-156, 2005.

• G.S. Manku, M. Bawa, and P. Raghavan, “Symphony: Distributed Hashing in a Small World,” USITS 2003.

Documents

Study on Network Size Estimation Schemes for Peer-to-Peer Networks