3
Toward a Synergy Between P2P and Grids P eer-to-peer (P2P) networks and grids are distributed computing models that enable decentral- ized collaboration by integrating com- puters into networks in which each can consume and offer services. P2P is a class of self-organizing systems or applications that takes advantage of dis- tributed resources — storage, processing, information, and human presence — available at the Internet’s edges. A grid is a geographically distributed compu- tation platform comprising a set of het- erogeneous machines that users can access through a single interface. Both are hot research topics because they offer promising para- digms for developing efficient dis- tributed systems and applications. Unlike the classic client–server model, in which roles are well separated, P2P and grid networks can assign each node a client or server role according to the operations they are to perform on the network — even if some nodes act more as server than as client in current implementations. In analyzing both models, we dis- cover that grids are, in essence, P2P systems. Although many aspects of today’s grids are based on hierarchical services, this is an implementation detail that should be removed in the near future. As grids used for complex applications increase from tens to thou- sands of nodes, we should decentralize their functionalities to avoid bottle- necks. The P2P model could thus help to ensure grid scalability: designers could use the P2P philosophy and tech- niques to implement nonhierarchical decentralized grid systems. In spite of current practices and thoughts, the grid and P2P models share several features and have more in common than we perhaps general- ly recognize. As Ian Foster and Adri- ana Iamnitchi point out (dsl.cs. uchicago.edu), a broader recognition of key commonalities could acceler- ate progress in both communities. It is time to consider how to integrate these two models. A synergy between the two research communities, and the two computing models, could start with identifying the similarities and differences between them. Basics In the past few years, P2P has attract- ed enormous media attention and gained popularity by supporting two main classes of applications: file sharing, in which peers share files with each other (Napster and Gnutella for music, for example) highly parallel computing, in which an (inherently) parallel application runs on available nodes (SETI@ home and FightAIDS@home, for example). Apart from these well-known systems, the P2P model is emerging as a new distributed paradigm because of its potential to harness the computing, storage, and communication power of hosts in the network to make their underutilized resources available to oth- ers. P2P shares this goal with the Grid, which was designed to provide access to remote computing resources for high-performance applications, data- intensive applications, or both. Although originally intended for advanced scientific applications, grid computing has emerged as a para- digm for coordinated resource shar- ing and problem solving in dynamic, multi-institutional, virtual organiza- tions in industry and business. Grid computing can be seen as an answer to drawbacks such as overloading, failure, and low QoS, which are inher- ent to centralized service provision- ing in client–server systems. Such problems can occur in the context of high-performance computing, for example, when a large set of remote users accesses a supercomputer. Grid nodes typically make their own resources available at the same time they are accessing resources on other 96 JULY • AUGUST 2003 Published by the IEEE Computer Society 1089-7801/03/$17.00©2003 IEEE IEEE INTERNET COMPUTING Peer to Peer Domenico Talia and Paolo Trunfio • University of Calabria continued on p. 94

Toward a Synergy Between P2P and Grids

Embed Size (px)

Citation preview

Page 1: Toward a Synergy Between P2P and Grids

Toward a Synergy BetweenP2P and Grids

P eer-to-peer (P2P) networks andgrids are distributed computingmodels that enable decentral-

ized collaboration by integrating com-puters into networks in which each canconsume and offer services. P2P is aclass of self-organizing systems orapplications that takes advantage of dis-tributed resources — storage, processing,information, and human presence —available at the Internet’s edges. A gridis a geographically distributed compu-tation platform comprising a set of het-erogeneous machines that users canaccess through a single interface.

Both are hot research topicsbecause they offer promising para-digms for developing efficient dis-tributed systems and applications.Unlike the classic client–server model,in which roles are well separated, P2Pand grid networks can assign eachnode a client or server role accordingto the operations they are to performon the network — even if some nodesact more as server than as client incurrent implementations.

In analyzing both models, we dis-cover that grids are, in essence, P2Psystems. Although many aspects oftoday’s grids are based on hierarchicalservices, this is an implementationdetail that should be removed in thenear future. As grids used for complexapplications increase from tens to thou-

sands of nodes, we should decentralizetheir functionalities to avoid bottle-necks. The P2P model could thus helpto ensure grid scalability: designerscould use the P2P philosophy and tech-niques to implement nonhierarchicaldecentralized grid systems.

In spite of current practices andthoughts, the grid and P2P modelsshare several features and have morein common than we perhaps general-ly recognize. As Ian Foster and Adri-ana Iamnitchi point out (dsl.cs.uchicago.edu), a broader recognitionof key commonalities could acceler-ate progress in both communities. Itis time to consider how to integratethese two models. A synergy betweenthe two research communities, andthe two computing models, couldstart with identifying the similaritiesand differences between them.

BasicsIn the past few years, P2P has attract-ed enormous media attention andgained popularity by supporting twomain classes of applications:

• file sharing, in which peers sharefiles with each other (Napster andGnutella for music, for example)

• highly parallel computing, in whichan (inherently) parallel applicationruns on available nodes (SETI@

home and FightAIDS@home, forexample).

Apart from these well-known systems,the P2P model is emerging as a newdistributed paradigm because of itspotential to harness the computing,storage, and communication power ofhosts in the network to make theirunderutilized resources available to oth-ers. P2P shares this goal with the Grid,which was designed to provide accessto remote computing resources forhigh-performance applications, data-intensive applications, or both.

Although originally intended foradvanced scientific applications, gridcomputing has emerged as a para-digm for coordinated resource shar-ing and problem solving in dynamic,multi-institutional, virtual organiza-tions in industry and business. Gridcomputing can be seen as an answerto drawbacks such as overloading,failure, and low QoS, which are inher-ent to centralized service provision-ing in client–server systems. Suchproblems can occur in the context ofhigh-performance computing, forexample, when a large set of remoteusers accesses a supercomputer.

Grid nodes typically make their ownresources available at the same timethey are accessing resources on other

96 JULY • AUGUST 2003 Published by the IEEE Computer Society 1089-7801/03/$17.00©2003 IEEE IEEE INTERNET COMPUTING

Peer to Peer

Domenico Talia and Paolo Trunfio • University of Calabria

continued on p. 94

Page 2: Toward a Synergy Between P2P and Grids

94 JULY • AUGUST 2003 http://computer.org/internet/ IEEE INTERNET COMPUTING

Peer to Peer

nodes. The grid model thus removesthe definite distinction between clientand server machines. However, currentgrid environments delegate specificmanagement or coordination func-tions to certain nodes that are requiredto take “major responsibility.” Somerecently developed P2P systems alsorequire nodes to act as servers, at leastwhen joining the network.

P2P comprises several kinds of appli-cations with different design goals, suchas anonymity (typically in file-sharingapplications), scalability (typically inhighly parallel computing applications),

or availability (in both application class-es). Moreover, P2P systems are based onseveral different designs:

• systems such as Napster use cen-tralized resource indexes,

• systems such as Gnutella use flood-ing-based search,

• some experimental systems such asGridella use structures with distrib-uted resource indexes, and

• hybrid networks, such as the super-peer model (described later), com-bine the P2P and client–servermodels.

As mentioned before, the identificationof similarities and differences betweengrid and P2P systems is a good start-ing point for finding a convergence.

Similarities and DifferencesIn analyzing the P2P and grid models,we must consider several significantaspects and issues. Here we discusssome of the main issues that determinefeatures of distributed computing mod-els. The techniques that the P2P and gridmodels use to handle those issues arekey to finding a common foundation.

SecuritySecurity is a central theme in grids,and several efforts are devoted to inte-grating relevant mechanisms forauthentication, authorization, integri-ty, and confidentiality in grid plat-forms. Nevertheless, such mechanismsare designed mainly for “closed com-munities,” in which designers havedevoted some effort to letting usersparticipate without accounts or trustrelationships. By their nature, suchsecurity mechanisms allow anonymi-ty of neither users nor resources.

In contrast, P2P systems originatein “open communities,” in which users

share more generic goals (such asretrieving music from the Internet),rather than specific objectives (such asparticipating in high-energy physicssimulations). For this reason, securitymechanisms in the most widespreadP2P systems generally don’t addressauthentication and content validation,but rather offer protocols that assureanonymity and censorship resistance.Although the two models currentlyhandle security differently, it should beinteresting to analyze how to exploitthe approaches to create a securitymodel for P2P grids.

ConnectivityGrids generally include powerfulmachines that are statically connectedthrough high-performance networkswith high levels of availability. On theother hand, the number of accessiblenodes is generally low because accessto grid resources is bonded to rigorousaccounting mechanisms.

Conversely, P2P systems are com-posed mainly of common desktopcomputers that are connected inter-mittently to the network, remainingavailable for a limited time with re-duced reliability. The number of

nodes connected in a P2P network ata given time is much greater than ina grid. Thus, the grid connectivityapproach is still too stiff for newnodes and user access and account-ing; it could benefit from the moreflexible connectivity models used inP2P networks today.

Access ServicesAccess to remote resources was themain motivation for building grids,and it remains the primary goal today.Grid toolkits provide secure servicesfor submitting batch jobs or executinginteractive applications on remotemachines; they also include mecha-nisms for efficiently sharing and mov-ing data across nodes.

Current P2P systems do not supportmechanisms for explicitly allocatingremote cycles and storage, but they doprovide protocols for sharing andexchanging data among nodes. P2Pjob-submission models and P2P jobscheduling might thus be very attrac-tive topics for research into applyingthe P2P approach to grid schedulingand job management.

Resource Discovery and Presence ManagementResource discovery in grid environ-ments is based mainly on centralizedor hierarchical models. In the GlobusToolkit (www.globus.org/toolkit), forinstance, a user or an application candirectly gain information about agiven node’s resources by querying aserver application running on it orrunning on a node that retrieves andpublishes information about a givenorganization’s node set. Because suchinformation systems are built toaddress the requirements of organiza-tional-based grids, they do not dealwith more dynamic, large-scale dis-tributed environments, in which use-ful information servers are not knowna priori. The number of queries insuch environments quickly makes aclient–server approach ineffective.

Resource discovery includes, in part,the issue of presence management —discovery of the nodes that are current-

The grid approach could benefit from the

more flexible connectivity models used in P2P.

continued from p. 96

Page 3: Toward a Synergy Between P2P and Grids

IEEE INTERNET COMPUTING http://computer.org/internet/ JULY • AUGUST 2003 95

Synergy between P2P and Grids

ly available in a grid — because globalmechanisms are not yet defined for it.On the other hand, the presence-man-agement protocol is a key element inP2P systems: each node periodicallynotifies the network of its presence, dis-covering its neighbors at the same time.Future grid systems should implementa P2P-style decentralized resource dis-covery model that can support grids asopen resource communities.

Fault ToleranceThe dynamic nature of grids necessi-tates some level of fault tolerance —especially for highly distributed code,such as parameter-sweep applications,which can fork numerous similar, inde-pendent jobs on many nodes.

Beyond simple checkpointing andrestarting, reliability and fault toler-ance are largely unexplored in gridmodels and tools. The Globus infor-mation system allows fault detection,for instance, but developers mustimplement fault tolerance at theapplication level. For greater reliabil-ity, designers of fault-tolerance mech-anisms and policies for grids shouldconsider using decentralized P2Palgorithms, which avoid centralizedservices that can represent criticalfailure points.

Where We Should GoDespite the interest in P2P and grid net-works, few noteworthy research effortsare currently devoted to finding com-monalities and synergies between them.In a significant exception, Fox and col-leagues have sketched a P2P architec-ture for grid-connected resources (www.communitygrids.iu.edu), but much moreremains to be done by members of bothcommunities.

We believe a P2P approach is need-ed both to

• implement grid tools and services,and

• design and develop grid applica-tions that must access and coordi-nate remote resources and services.

Two core Globus Toolkit components —

the monitoring and discovery service(MDS) and the replica management ser-vice — could be effectively redesignedusing a P2P approach, for example. Ifwe view current grids as federations ofsmaller grids managed by diverse orga-nizations, we can rethink the GlobusMDS for a large-scale grid by adoptingthe super-peer network model (www-db.stanford.edu/~byang/pubs/superpeer.pdf). In this approach, each super peeroperates as a server for a set of clientsand as an equal among other superpeers. This topology provides a usefulbalance between the efficiency of cen-tralized search and the autonomy, loadbalancing, and robustness of distributedsearch. In a grid information servicebased on the super-peer model, eachparticipating organization would con-figure one or more of its nodes to oper-ate as super peers. Nodes within eachorganization would exchange monitor-ing and discovery messages with a ref-erence super peer, and super peers fromdifferent organizations would exchangemessages in a P2P fashion.

Grid applications should be de-signed according to a decentralizedmodel. This can require additionaleffort to develop because of the cur-rent lack of P2P–grid middleware, butP2P–grid tools and services couldgreatly simplify such tasks in thefuture we envision.

Aligning TechnologiesThe grid community recently initiated adevelopment effort to align grid tech-nologies with Web services: the OpenGrid Services Architecture (OGSA) letsdevelopers integrate services and re-sources across distributed, heteroge-neous, dynamic environments and com-munities. The OGSA model adopts theWeb Services Description Language(WSDL) to define the concept of a gridservice using principles and technologiesfrom both the grid and Web servicescommunities. Web services and theOGSA both seek to enable interoper-ability between loosely coupled services,independent of implementation, loca-tion, or platform.

OGSA provides an opportunity to

integrate P2P and the Grid. The archi-tecture defines standard mechanismsfor creating, naming, and discoveringpersistent and transient grid-serviceinstances. It will be an interesting chal-lenge to determine how to use OGSA-oriented grid protocols to build P2Papplications. By implementing serviceinstances in a P2P manner within sucha framework, developers can provideP2P service configuration and deploy-ment on the grid infrastructure. A peercould thus invoke a grid service byexchanging a specified sequence ofmessages with a service instance,which might invoke another grid ser-vice published by another peer throughan associated grid service interface.

Developers and users could exploitthe many contact points between P2Pand grid networks by recognizing P2P’srelevance to corporations and publicorganizations rather than viewing it asjust a home computing technology.They also could exploit P2P protocolsand models to face grid-computingissues such as scalability, connectivity,and resource discovery. A synergybetween P2P and grids could lead tonew highly distributed systems in whicheach computer contributes to solving aproblem or implementing a systemwhile also using services offered byother computers in the network. Enter-prises, public institutions, and privatecompanies could find it both useful andprofitable to develop distributed appli-cations on a world-wide Grid.

Domenico Talia is a professor of computer sci-

ence at the University of Calabria. His

research interests include grid computing,

parallel systems, and data mining. He

received a Laurea degree in physics from the

University of Calabria. Talia is a member of

the IEEE Computer Society and the ACM.

Contact him at [email protected].

Paolo Trunfio is a PhD student in computer engi-

neering at the University of Calabria. His

research interests include grid computing,

peer to peer, and parallel systems. He

received a Laurea degree in computer engi-

neering from the University of Calabria.

Contact him at [email protected].