21
© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids An overlay network for resource discovery in Grids Manfred Hauswirth (joint work with Roman Schmidt) Ecole Polytechnique Fédérale de Lausanne (EPFL) School of Computer and Communication Sciences Funded by the European integrated project DIP “Data, Information, and Process Integration with Semantic Web Services”, Contract no. 507483

An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids

An overlay network for resource discovery in GridsManfred Hauswirth (joint work with Roman Schmidt)

Ecole Polytechnique Fédérale de Lausanne (EPFL)School of Computer and Communication Sciences

Funded by the European integrated project DIP – “Data, Information, and Process Integration with Semantic Web Services”,

Contract no. 507483

Page 2: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 2

Outline of the presentation

Approaches to resource discovery in GridsUsing P2P systems for resource discovery in GridsDetour: The basics of scalable data access structures and overlay networksOur proposal: Using the P-Grid overlay network for resource discoveryExperimental evaluationConclusions

Page 3: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 3

Approaches to resource discovery in Grids - 1

Centralized: Condortargets primarily optimal CPU utilizationcentralized matchmaker to match resource requests with offersefficient for small grids in LANs, but does not scale to larger sizes

Hierarchical: Monitoring and Discovery Service (MDS) used in Globusbased around WSRF (Web Services Resource Framework) standardsprovides a registry similar to UDDIquery and subscription (trigger) interfaces support of global-scale grids: the hierarchical organization and query routing has hot-spots and single-points-of-failure

Page 4: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 4

Approaches to resource discovery in Grids - 2

Decentralized / P2P: Iamnitchi et al.unstructured P2P network similar to Gnutella combined with Freenet-style query forwarding less traffic than pure Gnutella but no lookup guarantees

Decentralized / P2P: Gupta et al.based on a range-query-enhanced version of CANranges are hashed and indexed ⇒ simple key search operations are not supported or are highly inefficient (both areneeded) ⇒ separate indexesCAN does not support efficient updates (update ⇒ new responsible peer)search efficiency is only guaranteed for uniform partitioning ofkey space

Page 5: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 5

Are overlay networks usable for resource discovery in Grids?

Grid communityoften uses inefficient versions of existing P2P approaches

P2P communitydoes not address the specific needs of Grid computing

exact search is fast, other search predicates do not exist or are inefficientfrequent update of resource state required but updates are either not supported or are inefficient

some assumptions are inadequateGrids normally do not have very large numbers of nodes and datanode population is rather stable

Advantages of P2P approachno dedicated nodes requiredno “single point of failure” (node, network)implicit load distribution and balancingno dedicated infrastructure needed - “the system is the directory”

Resource discovery based on overlay networks seems an interesting approach for global-scale / large-scale Grids, otherwise other approaches may be more applicable

Page 6: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 6

Detour: Data access structures

Search tree (prefix tree)

000 001 010 011 100 101 110 111

00? 01? 10? 11?

0?? 1??

???

extradata

101?

101?

101?

101!

N objects log2(N) steps

Page 7: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 7

Detour: Scalable data access structures - 1

Assume number of data objects >> storage of one node

Distributed storage

Given a data access structureSize of data access structure = number of data objectsSize of data access structure >> storage of one node

Problem: where to store?

Page 8: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 8

Detour: Scalable data access structures - 2

"Napster"bottleneck

000 001 010 011 100 101 110 111

00? 01? 10? 11?

0?? 1??

???

peer 1 peer 2 peer 3 peer 4

Page 9: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 9

Detour: Scalable data access structures - 3

Associate each peer with a complete path

000 001 010 011 100 101 110 111

00? 01? 10? 11?

0?? 1??

???

peer 1 peer 2 peer 3 peer 4

Page 10: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 10

Detour: Scalable data access structures - 4

• Associate each peer with a complete path

100 101

10?

1??

???

peer 1 peer 2

peer 3

peer 4

know more about this part of the tree

knows more about this part of the tree

Page 11: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 11

Detour: The result is P-Grid

11?

1??

???

peer 4

peer 1 peer 2

peer 3

110 111

100 101

10?

1??

???

peer 1 peer 2

peer 3

peer 4

101?

101?

101?

101?

101!

• Peers cooperate in search

Messageto peer 3

101 ?

Page 12: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 12

Detour: P-Grid queries

???

00?

0??

01?

1??

10? 11?

1 6 2 3 4 51 : 301 : 2

Stores datawith keyprefix 00

1 : 501 : 2

Stores datawith keyprefix 00

1 : 400 : 6

Stores datawith keyprefix 01

0 : 211 : 5

Stores datawith keyprefix 10

0 : 611 : 5

Stores datawith keyprefix 10

0 : 610 : 4

Stores datawith keyprefix 11

query(6, 100)query(5, 100)

query(4, 100)

4

Page 13: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 13

P-Grid overview - 1

Efficient search in O(log n) steps (n nodes) even for skewed distributionsExact search, substring search, and efficient range queries [IEEE P2P 2005] (simple XPath is already supported as well [ODBASE 2005])

2 range-query algorithms: min-max, shower

Efficient, epidemic update algorithm for highly unreliable environments [ICDCS 2003]

Page 14: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 14

P-Grid overview - 2

Load-balancing of memory and replication load (availability)Prefix-preserving hash function for key generation

s1 < s2 ⇒ h(s1) < h(s2)⇒ clustering of similar information

P-Grid’s trie only exists virtually, in fact the system is “flat” and all nodes are equalSelf-organized construction of the indexIndividual P-Grids can be split and mergedAvailable from http://www.p-grid.org/ under a modified GPL

Page 15: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 15

Our approach: Using P-Grid for resource discovery

Instead for resources and their states, job requirements are advertised, for example,

Providers actively look for jobs (exact search or range queries)and accept the ones they want to⇒ less updates required⇒ resource provider is in control

Specific problems to addresskey distributions may be highly skewed, for example, if most jobadvertisement are at the maximum of possible values and then sharply decrease.but also uniform distributions have to be supportedrobustness, scalability and efficiency

CPU_cycles=3500,disk=50MB,mem=1024MB,advertiser=http://need.cpu.com/job42

Page 16: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 16

Experimental evaluation on PlanetLab

PlanetLab: World-wide testbed for distributed applicationsapprox. 450 nodeswide range of network connectivity (T1, DSL, etc.)large number of experiments in parallel

250 peers, each running on a dedicated PlanetLab node2500 unique data keys (Pareto and uniformly distributed), each peer selects 10, average replication factor was set to 5 ⇒ 18750 keys in the system, each peer is responsible for 50-100 keys

Each node performs a query with a random lower bound for each distribution, with 2 different algorithms, and for each of the answer set sizes (50, 100, 150, 200, 400, and 800), i.e., a total of 250 * 2 * 2 * 6 = 6000 queries

Page 17: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 17

Experimental results: Message latency (hops)

Page 18: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 18

Experimental results: Latency (time)

Page 19: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 19

Experimental results: Message costs

Page 20: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 20

Conclusions

Overlay networks for resource discovery could be applicable in very large scale GridsBase overlay technologies exist but a more in-depth investigation of applicability is necessary (latency, updates, etc.) Job advertisements instead of resource advertisements may be also interesting for other Grid discovery approaches to strengthen the autonomy and control of the resource providerP-Grid overlay was tested under worst-case conditions as infrastructure for discovery with promising results

We can expect much better results in Grid environments which are more stable

More cooperation between Grid and P2P communities may be necessary and fruitful

Page 21: An overlay network for resource discovery in Grids · unstructured P2P network similar to Gnutella combined with ... node population is rather stable Advantages of P2P approach

© 2005 Manfred Hauswirth An overlay network for resource discovery in Grids 21

Thank you!

Questions?