24
UNIVERSITY OF JYVÄSKYLÄ Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007 Mikko Vapa, research student P2P Computing Group Department of Mathematical Information Technology http://www.mit.jyu.fi/ cheesefactory

Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

  • Upload
    kaoru

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007. Mikko Vapa, research student P2P Computing Group Department of Mathematical Information Technology http://www.mit.jyu.fi/ cheesefactory. Resource Discovery. Resource Discovery Problem. - PowerPoint PPT Presentation

Citation preview

Page 1: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Resource Discovery in Unstructured P2P NetworksDistributed Systems Research Seminar on 22.3.2007

Mikko Vapa, research studentP2P Computing Group

Department of Mathematical Information Technology

http://www.mit.jyu.fi/cheesefactory

Page 2: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Resource Discovery

Page 3: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Resource Discovery Problem

• In peer-to-peer (P2P) resource discovery problem any node in the network can possess resources and also query these resources from other nodes

Node1: Where is ?

Node 1

Node 2

Node 3

Node 4

Page 4: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

A Simple Solution for the Problem

• The most studied P2P network, Gnutella, for example used Breadth-First Search (BFS) flooding algorithm which sends query to all neighbors

• Problems: all resources in the network can be found, but network gets congested and there are lots of useless packets

Node 1: Where is ?

Node 1

Node 2

Node 3

Node 4

Query

QueryQuery

Query

Query

Query

Node 4: I have it!

Node 2: I have it!Node 4: Node 4 has it too!Reply

Reply

Page 5: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Near-Optimal Solution:Steiner Minimum Tree Problem

• Optimal paths for resource discovery can be found by using non-distributed algorithm which requires global knowledge of topology and resources

• Precisely, this problem can be formulated as a task of finding a Steiner Minimum Tree (SMT) from a graph:

Page 6: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

MST k-Steiner Minimum Tree Algorithm

• MST k-Steiner Minimum Tree Algorithm was developed for finding an approximation solution:

Page 7: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

MST k-Steiner Minimum Tree Algorithm

1

m2

r1 r2

r5

r4r3

m1

7

13

1

6

1

31

1m3

r1 r2

r5

r4r3

7

66

1

5

6

1

5

5

5

r1

r5

r4r3

1

5

5

Graph G Graph GR after step (1) Tree TR after step (2)

m2

r1

r4r3

m13

1

1

31

r5

1

1m3

1

m3m2

r1

r4r3

m13

1

3

r51

1

1

1

r1

r4r3

m13

1

3

r5m3

Graph H after step (3) Tree T after step (4) Tree T after step (5)

EEO log2

Time Complexity:

whereE = number ofedges in a graph G

Worst-CaseApproximation Ratio:

2

R

whereR = availableresources

Page 8: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Efficiency =Found Replies / Query Packets

• MST k-Steiner Minimum Tree algorithm shows that current local search algorithms for peer-to-peer networks are far from optimal paths

Efficiency of the Algorithms

0

0,2

0,4

0,6

0,8

1

1,2

0,0 20,0 40,0 60,0 80,0 100,0

% of Resources

Eff

icie

ncy Steiner

HDS

BFS

Gnutella topology of ~75000 nodes

0,001

0,01

0,1

1

0,0 20,0 40,0 60,0 80,0 100,0

% of Resources

Eff

icie

ncy

k-Steiner HDS RWSA BFS

Page 9: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄHighest Degree Search

K-Steiner Minimum Tree

K-Steiner Tree Algorithm locates9 resource instances with 11 query packets. For this querythe approximated solutionis also the optimal solution.HDS uses almost twice as muchquery packets for this query.

Page 10: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Hops

• MST k-Steiner does not use the shortest paths to locate resources

Hops Used by the Algorithms

0

1

2

3

4

5

6

0,0 20,0 40,0 60,0 80,0 100,0

% of Resources

Ho

ps

Steiner BFS

Page 11: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Branching Factor =Average Number of Children of Each Node Having Children in A Search Tree

• MST k-Steiner starts as one search direction algorithm, but changes to multiple search direction algorithm when more resources are being located

Branching Factor of the Algorithms

01

2345

67

0,0 20,0 40,0 60,0 80,0 100,0

% of Resources

Bra

nch

ing

Fac

tor

BFS Steiner

Page 12: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

MST k-Steiner Minimum Tree Algorithm

• Ways to improve MST k-Steiner:– Conducting an extensive survey of related work in graph theory and

k-Steiner Minimum Trees and modifying the problem to support multiple resource instances on a same node (Prize Collecting Steiner Tree problem with Quota)

– Getting the results published:Vapa M., Auvinen A., Ivanchenko Y., Kotilainen N., Vuori J., ”K-Steiner Minimum Tree Is An Upper Bound for Unstructured Peer-to-Peer Resource Discovery Algorithms”, submitted to Euro-Par 2007

– Now all the tools are available for discovering the theoretical limit of peer-to-peer technology in terms of total traffic induced on a telecommunication network in a given peer-to-peer network compared to client-server approach

– However, real-world applicability of ”Distributed k-Steiner minimum tree resource discovery algorithm” seems to be impossible, because all caching in P2P networks is likely to be useless (wide namespace, dynamic peers, dynamic topology and possibly changing content)

Page 13: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Distributed Resource Discovery

Page 14: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Distributed Resource Discovery

• Distributed Resource Discovery needs to be solved using distributed algorithm and therefore k-Steiner Minimum Tree cannot be used directly

• In distributed resource discovery the node has to forward the query based on local knowledge

Node 1: Where is ?

Node 1

Node 2

Query

Node 2: I have it!

But whom should Iforward this queryfurther?

Reply

Unknowntopology

Unknowntopology

Page 15: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

NeuroSearch

Page 16: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Our Solution: NeuroSearch

• NeuroSearch resource discovery algorithm uses neural networks and evolution to adapt its’ behavior to given environment– neural network for deciding whether to pass the query further

down the connection or not– evolution for breeding and finding out the best neural

network in a large class of local search algorithms

Query

Forward the query

Forward the query

Neighbor Node

Neighbor Node

Page 17: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

NeuroSearch’s Inputs• The internal structure of NeuroSearch algorithm

• Multiple layers enable the algorithm to express non-linear behavior

• With enough neurons the algorithm can universally approximate any decision function

Page 18: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

NeuroSearch’s Training Program

• The neural network weights define how neural network behaves so they must be adjusted to right values

• This is done using iterative optimization process based on evolution and Gaussian mutation

Define theP2P networkconditions

Define the fitness requirements

for the algorithm

Create candidate algorithmsrandomly

Select the bestones for next

generation

Breed a newpopulation

Finally select thebest algorithm forthese conditions

Iteratethousands

ofgenerations

Compare the bestone against other

local search algorithms

Page 19: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

An Example

Page 20: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Typical Query Pattern of NeuroSearch

NeuroSearch uses 26 querypackets to locate 11 resourceinstances. There is a total of 17resource instances availableso locating 9 resource instanceswould have been enough to reach50% of resource instances.

Page 21: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Efficiency of the Algorithms in 100 nodes Power-Law Topology

0

0,2

0,4

0,6

0,8

1

1,2

0,0 20,0 40,0 60,0 80,0 100,0

% of Resources

Eff

icie

ncy

Optimal Steiner HDS NeuroSearch BFS

Ranking List

• Highest Degree Search is currently the best known local search algorithm for power-law distributed scenario

NeuroSearch 2004NeuroSearch 2003

Page 22: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Ideal Algorithm• NeuroSearch is close to

HDS in performance, but different in nature:– NeuroSearch uses

maximum number of hops far less than one search direction algorithms resulting in a low latency for searching

• Ideal would be to find an algorithm that:– Has low maximum hops– Has high efficiency

independent of how many resources needs to be located

– Sustains these properties in many P2P scenarios

Hops Used by the Algorithms

0

20

40

60

80

100

120

140

160

180

0,0 20,0 40,0 60,0 80,0 100,0

% of Resources

Ho

ps

HDS NeuroSearch 16:4 7 inputs

NeuroSearch 10:10 27 inputs NeuroSearch 16:4 23 inputs

NeuroSearch 30:20 23 inputs Steiner

BFS

Page 23: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

Future Work

• Now the first versions of NeuroSearch are ready and analyzed• Ways to enhance NeuroSearch include:

– History-based inputs to allow more accurate decisions– Studying the scalability factors affecting NeuroSearch when

the P2P network size grows– Analysis of the behavior in dynamic conditions– Speeding up the optimization process by parallelizing

evolutionary algorithm using distributed computing– The computational cost is demanding and replacing the

optimization algorithm does not help (see: Neri, Kotilainen, Vapa, ”An Adaptive Global-Local Memetic Algorithm to Discover Resources in P2P Networks”, to be published in EvoCOMNET 2007)

• Less flexible approximator could replace neural network

Page 24: Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

UNIVERSITY OF JYVÄSKYLÄ

References

• M. Vapa, A. Auvinen, Y. Ivanchenko, N. Kotilainen, J. Vuori, K-Steiner Minimum Tree Is An Upper Bound for Unstructured Peer-to-Peer Resource Discovery Algorithms, submitted to Euro-Par 2007.

• F. Neri, N. Kotilainen, M. Vapa, An Adaptive Global-Local Memetic Algorithm to Discover Resources in P2P Networks, to be published in EvoCOMNET 2007

• M. Vapa, N. Kotilainen, H. Kainulainen, J. Vuori, “Resource Discovery in P2P Networks Using Evolutionary Neural Networks”, International Conference on Advances in Intelligent Systems – Theory and Applications (AISTA 2004), 15.-18.11.2004.