5
A Framework for Pure Peer-to-Peer Computing System Jigyasu Dubey Department of Information Technology Shri Vaishnav Institute of Technology & Science, Indore, India [email protected] Vrinda Tokekar Institute of Engineering & Technology Devi Ahilya Vishwavidyalaya Indore, India [email protected] AbstractPeer-to-Peer(P2P) computing is a new approach to establish a high performance computing system which aggregates idle computing cycles of PCs available on Internet. In this paper we propose a general purpose pure P2P computing framework for large scale scientific computation problems. Peer groups are fundamental building blocks of proposed framework. We propose four peer groups- administrator, query manager, task distributor and task processor group, and also define role and functionalities of these peer groups in framework. The framework provides a conceptual blueprint for a pure P2P computing system which is useful for general users to process their large computation problems. Keywords- Peer, P2P, Distributed Computing, P2P Computing, Group I. INTRODUCTION The peer-to-peer (P2P) technology provides support to build virtual computing system over the Internet which is dedicated for large scale computation problems. It is the way to harvest the computing resources and enable network collaboration [1]. The P2P computing systems are usually formed by the Desktop PCs which are connected to the Internet, and randomly for some time duration participate in P2P activities. Due to fast growing processing capability, storage capacity, and network capability of PCs, a global P2P computing system will become as world’s fastest super computer virtually. In comparison to other forms of distributed computing, such as grids [2], P2P computing system features high degree of diversity and dynamically changing peers. Every peer may join or leave the system at its own decision occasionally. The biggest advantage of P2P computing is that it is able to achieve the same processing capability but with the lower cost than super computer. Currently, several users are participating in distributed computing projects like distributed.net [3] and SETI@home [4] [5]. At present most P2P computing systems are application specific and based on different P2P architectures. They can be classified according to the type of P2P computing architecture used whether it is centralized (hybrid) or distributed (pure). Seti@Home is the most popular and successful example of the P2P computing system which is based on hybrid P2P network architecture. SETI@home system strongly relies on its server to distribute jobs to each participating peer and to collect results after processing is done. Due to the centralization in system the reliability and scalability issues are affected. To increase the reliability and scalability decentralized systems are required. The pure P2P architecture is decentralized in nature. In such network scalability is improved by dividing the computational resources into groups. SETI@home, distributed.net, and other existing P2P computing systems are dedicated for the specific task only. General users are not allowed to submit their own jobs for processing over existing P2P computing systems. Hence, new system is required where general user can submit their own job for processing and take the advantage of spare resources available on Internet. Here we propose a pure P2P computing framework for large scale scientific computational problems. In framework peers can join and leave dynamically during the lifetime of a computation. The framework is redundant such that the dynamic nature of peers does not affect the results. The proposed framework is scalable. In the framework peers are organized into groups, such that inter-peer communication does not occur, the communication occurs between peer groups only. The given framework is heterogeneous where peers with different platforms can participate. We are targeting an open environment, one that is accessible by the average citizen and does not require membership in any organization. II. RELATED WORK In the recent years peer-to-peer (P2P) technology get the more attention from the research community as well as the industry. Decentralization is one of the main concepts of P2P networks that make it more attractive. Initially P2P networks are used only for file sharing applications such as napster [6] and BitTorrent [7]. Nowadays P2P networks are also used for developing large scale distributed computing applications. The distributed.net [3] and SETI@home [4] [5] are the well known examples of such kind systems. The SETI@home is most successful application with the large number of contributors because of their exciting application theme “search for an extraterrestrial intelligence” [4] [5]. SETI@home has a centralized website for distribution of jobs for processing and result collection related to astronomical data only. To increase the reliability and scalability decentralized systems are required. Jerome Verbeke, Neelakanth Nadgir et al. in [8] presented a decentralized P2P computing framework for large-scale 978-1-4673-1989-8/12/$31.00 ©2012 IEEE

[IEEE 2012 Ninth International Conference on Wireless and Optical Communications Networks - (WOCN) - Indore, India (2012.09.20-2012.09.22)] 2012 Ninth International Conference on Wireless

  • Upload
    vrinda

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 Ninth International Conference on Wireless and Optical Communications Networks - (WOCN) - Indore, India (2012.09.20-2012.09.22)] 2012 Ninth International Conference on Wireless

A Framework for Pure Peer-to-Peer Computing System

Jigyasu Dubey Department of Information Technology

Shri Vaishnav Institute of Technology & Science, Indore, India

[email protected]

Vrinda Tokekar Institute of Engineering & Technology

Devi Ahilya Vishwavidyalaya Indore, India

[email protected]

Abstract— Peer-to-Peer(P2P) computing is a new approach to establish a high performance computing system which aggregates idle computing cycles of PCs available on Internet. In this paper we propose a general purpose pure P2P computing framework for large scale scientific computation problems. Peer groups are fundamental building blocks of proposed framework. We propose four peer groups- administrator, query manager, task distributor and task processor group, and also define role and functionalities of these peer groups in framework. The framework provides a conceptual blueprint for a pure P2P computing system which is useful for general users to process their large computation problems.

Keywords- Peer, P2P, Distributed Computing, P2P Computing, Group

I. INTRODUCTION

The peer-to-peer (P2P) technology provides support to build virtual computing system over the Internet which is dedicated for large scale computation problems. It is the way to harvest the computing resources and enable network collaboration [1]. The P2P computing systems are usually formed by the Desktop PCs which are connected to the Internet, and randomly for some time duration participate in P2P activities. Due to fast growing processing capability, storage capacity, and network capability of PCs, a global P2P computing system will become as world’s fastest super computer virtually. In comparison to other forms of distributed computing, such as grids [2], P2P computing system features high degree of diversity and dynamically changing peers. Every peer may join or leave the system at its own decision occasionally. The biggest advantage of P2P computing is that it is able to achieve the same processing capability but with the lower cost than super computer. Currently, several users are participating in distributed computing projects like distributed.net [3] and SETI@home [4] [5]. At present most P2P computing systems are application specific and based on different P2P architectures. They can be classified according to the type of P2P computing architecture used whether it is centralized (hybrid) or distributed (pure). Seti@Home is the most popular and successful example of the P2P computing system which is based on hybrid P2P network architecture. SETI@home system strongly relies on its server to distribute jobs to each participating peer and to collect results after processing is

done. Due to the centralization in system the reliability and scalability issues are affected. To increase the reliability and scalability decentralized systems are required. The pure P2P architecture is decentralized in nature. In such network scalability is improved by dividing the computational resources into groups. SETI@home, distributed.net, and other existing P2P computing systems are dedicated for the specific task only. General users are not allowed to submit their own jobs for processing over existing P2P computing systems. Hence, new system is required where general user can submit their own job for processing and take the advantage of spare resources available on Internet.

Here we propose a pure P2P computing framework for large scale scientific computational problems. In framework peers can join and leave dynamically during the lifetime of a computation. The framework is redundant such that the dynamic nature of peers does not affect the results. The proposed framework is scalable. In the framework peers are organized into groups, such that inter-peer communication does not occur, the communication occurs between peer groups only. The given framework is heterogeneous where peers with different platforms can participate. We are targeting an open environment, one that is accessible by the average citizen and does not require membership in any organization.

II. RELATED WORK

In the recent years peer-to-peer (P2P) technology get the more attention from the research community as well as the industry. Decentralization is one of the main concepts of P2P networks that make it more attractive. Initially P2P networks are used only for file sharing applications such as napster [6]and BitTorrent [7]. Nowadays P2P networks are also used for developing large scale distributed computing applications. The distributed.net [3] and SETI@home [4] [5] are the well known examples of such kind systems. The SETI@home is most successful application with the large number of contributors because of their exciting application theme “search for an extraterrestrial intelligence” [4] [5]. SETI@home has a centralized website for distribution of jobs for processing and result collection related to astronomical data only. To increase the reliability and scalability decentralized systems are required. Jerome Verbeke, Neelakanth Nadgir et al. in [8] presented a decentralized P2P computing framework for large-scale

978-1-4673-1989-8/12/$31.00 ©2012 IEEE

Page 2: [IEEE 2012 Ninth International Conference on Wireless and Optical Communications Networks - (WOCN) - Indore, India (2012.09.20-2012.09.22)] 2012 Ninth International Conference on Wireless

computation problems named as JNGI. In this framework the computational resources are divided into groups according to their functionality. The design of framework limits communication to small peer groups that enables the framework to scale to a very large number of peers. Jerome Verbeke, et al. in [9] proposed to build new types of groups called similarity groups into the JNGI system. In a similarity group all the peers have common characteristics like CPU speed or memory size. Their result shows that the use of quantitative similarity groups increases the performance of a computation while qualitative criterion increases the homogeneity of the computation but not its performance. However peer grouping based on geographic location criteria needs to be considered to improve the reliability. N. A. Al-Dmour and W. J. Teahan in [10] present a decentralized P2P computing system ParCop. It supports the master worker style of applications. The ParCop system is developed in JAVA, without using any native methods. It supports only those applications where problem can be divided into small tasks. Lo, et al. in [11] proposed a system named cluster computing on the fly (CCOF) which harvest the CPU cycles from ordinary users (Desktop PCs). They also proposed a wave scheduler which exploits the large blocks of ideal time at night, to provide higher quality of service for deadline-driven jobs, using a geographic based overlay to organize hosts by time zone. It provides a higher guarantee of ongoing available cycles hence it is useful for deadline driven tasks. The system provides the higher computation performance, however due to using the peers from same night time zone which belongs to the same geographic location, the reliability of the system decreases. Authors in [12] propose a generic Peer-to-Peer computing system (P2PCS) to process complex and large scale scientific computation problems. The system utilizes the CPU cycles of desktop PCs which are connected to the network to perform the computations. However to improve the reliability and scalability of the P2PCS system issue of peer grouping and peer selection criteria need to be consider.

III. FRAMEWORK DESIGN In designing a P2P computing framework, one needs to

address issues of reliability and scalability at the beginning. We design the framework keeping the JXTA protocols in mind. The advantage of designing the framework by using the JXTA protocol is concept of peer groups. By using the peer groups as fundamental building block of the framework the scalability of the framework is improved by restricting the communication between the individual peers and allowing communication between peer groups only in the frame work. Selecting criteria for grouping the computing resources is the first step in designing of pure P2P computing system. Various criteria can be used for grouping peers like physical location of peers, qualitative like operating platforms and technology, and quantitative like CPU speed, memory size etc. The problem is that if in a group all the peers belong to same geographic location than in case of any natural disaster or power failure in that area the whole group is collapsed which

affects the working of system thereby requiring grouping criteria based on physical locations also. The fig. 1 gives the overview of proposed pure P2P computing system. In fig.1 G1, G2, G3,… represents the different geographic locations.

Figure 1. Overview of proposed pure P2P computing system

Our P2P computing framework, as shown in fig. 2, contains the four peer groups: administrator group, task distributor group, task processing group, and query manager group. The administrator group is top level group in the framework. It coordinates overall activities of system. The task distributor group is responsible for distribution of task, submitted by task submitter, to task processing group. The task processor group is responsible for performing the computation of a particular task in the framework. The query manger group keeps status of all peer groups in the framework and provides status information to the administrator group and task distributor group.

Figure 2. Architecture of proposed system

A. Administrator Group It is the top level group in the framework. It coordinates

the overall activities of the system. It handles the request from new peers to join the framework by assigning their role in the frame work and redirecting them to the appropriate peer group. It also responds to the job submitter’s request by providing him task distributor group ID. Fig.3 describes activities of administrator group in framework.

Page 3: [IEEE 2012 Ninth International Conference on Wireless and Optical Communications Networks - (WOCN) - Indore, India (2012.09.20-2012.09.22)] 2012 Ninth International Conference on Wireless

Figure 3. Admin communicate with Query Manager

B. Task Distributor Group In the proposed framework this group is responsible for

distribution of task, submitted by the job submitter, to task processing group. After distributing a job it also collect the results back from the task processing group and give this results to the job submitter. Task distributor peers within a group periodically exchange their latest results; if a task distributor becomes unavailable its results will not be lost. In order to make sure that the task submitted by a peer is completed correctly the task is replicated to other task processing group in framework. The peers in this group are chosen so that all they are belongs to different geographical locations. Multiple instances of the task distributor group may exist in the framework.

C. Task Processor Group In proposed framework task processor group is

responsible for performing the computation of a particular job. The code for the computation is provided by the task distributor group which owns the task processing group. In the framework multiple task processor groups are exists and each group is dedicated for a particular type of job. Every task processor group in the framework is governed by the task distributor group.

D. Query Manager Group The query Manger group keeps the status of all the peer

groups in the framework. After a random time interval it updates the status information and sends the alert about status of various groups in the framework to the admin group. It also records the status of each peer of the framework.

E. Peer Joining Process The fig. 4 describes how new peers will join framework.

The new peers always join the system at the lowest level of the hierarchy in the task processor group. The new peer which wants to provide or share its computing resources first sends a request to the administrator group. The main role of administrator group is to assign the role to the new peer in the framework and redirect them to join the appropriate task processor group in the frame work. To decide which task processor group is assigned to new peer, the administrator group uses the geographic property of that peer and status information of existing task processor groups received from the query manager. The status information contains the

current information about the workload on different task processor groups, total number of peers in each task processor group, number of task processor groups active in the framework, etc. After the initial redirection to the appropriate task processor group, no further communication is needed between the peer and administrator group.

Figure 4. Peer joining process

F. Peer Failure The task processor peers join the system at the lower level

of the hierarchy in the framework. They are not exposed to the upper level hierarchy. Hence failure of a task processor peer does not affect the framework. The task distributor peers are exposed to the middle level hierarchy. When a task distributor/query manager peer fails one of the task processor peers with category ‘A’, as given in [13], will replace that peer. The admin peers exist at the top level of the hierarchy. When an admin peer fails one of the task distributor peers with category ‘A’ [12] will replace that peer.

Figure 5. Peer in lower groups join Upper group as peer failure occurs in a

group.

G. Task Submission and Result Retrieval Process To submit the task the task submitter in the framework

must make a request to the administrator group to submit a task for computation. The administrator group redirects the task submitter to the appropriate task distributor group. The administrator group uses the information status received from query manager and check that which task distributor group has the code availability and less workload. The administrator group responds the task submitter with appropriate task

Page 4: [IEEE 2012 Ninth International Conference on Wireless and Optical Communications Networks - (WOCN) - Indore, India (2012.09.20-2012.09.22)] 2012 Ninth International Conference on Wireless

distributor group and its group ID. The task submitter after getting the task distributor group ID, submit the task to that task distributor group in the framework. Now it is the responsibility of task distributor group to distribute the task for computation. The task distributor uses the information status to determine which task processor group is ideal or having less work load and assign the task to that task processor group. The fig. 6 describes the task submission and result retrieval process.

Figure 6. Task submission & result retrieval process

The job submitter sends job execution request to the admin group. One of the peers from the admin group responds the job execution request by providing task distributor group ID. After getting the task distributor group ID the job submitter submits the job to the task distributor group. Then a task distributor peer in group break the job into small jobs and distributes the job among peers in appropriate task processing group. After processing the job task processor peers send results back to the task distributor group. The job submitter after successfully submitting the job randomly queries the task processing group about the status of the submitted job. As the job processing is completed the job submitter directly collects the results from task distributor group. There may be many instances of each peer group in the framework.

IV. RELIABILITY In the proposed framework all the functionalities are

performed by a peer group. It is the responsibility of group members to carry out their group functionality, if any group member is lost; the members of the lower peer group which belongs to category ‘A’ as specified in [13] take the responsibility of that member. Let Rate of peer joining framework - RPJ, Rate of peer leaving framework - RPL, Rate of getting Peers from other groups - RGJ, Rate of giving peers to other groups – RGL, Maximum Group size – GMAX, and Number of Group – G, then: Effective No. of peers joining the system per unit time = RPJ + RGJ – RPL - RGL Effective No. of peers joining peer group = (RPJ + RGJ – RPL – RGL) / G

A new group in system is created only if the no. of peers in a group crosses GMax thus

No. of groups created by a group = (RPJ + RGJ – RPL – RGL) / G * GMax.

To form the peer groups in the framework the location property of the peer is also considered. Framework uses the geographic location of the peers to form different peer groups. The member of a peer group is chosen so that all the peers are not belonging to the same geographic area. In every peer group in the framework member peers are located in different geographic locations. The advantage of using geographic location property as mentioned above is that if electricity is down in particular geographic region all the peers in that area is also down but working of peer groups are not affected because peers belonging to other locations are remains working. The power failure can occur due to natural disaster in that area, electric grid failure, or communication link is down. It is also possible that due to ISP’s Internet server failure the peer’s belonging to a particular location is not accessible to the framework. If we choose the members of a peer group from different geographic locations then the possibility to alive for peer groups are more than the case in which peers in peer groups are chosen without considering geographic locations of peers. To consider the location property of a peer to form a peer group the framework defines and distributes the geographical area into zones. For example in frame work a country may be a geographic zone.

V. PERFORMANCE EVALUATION This section measures performance of proposed

framework. For simulation purpose we partially implement the functionality of framework in JAVA using JXTA-JXSE 2.5 libraries. Our experiment has been conducted on a local network in which 15 PCs are interconnected with 100 Mbps fast Ethernet switch. In this local network each PC has Intel® core™ i5 CPU 650 @ 3.20 GHz with 4 GB RAM, running Microsoft windows 7.

In our experiment we use one PC as administrator peer, one as query manager, one as task distributor and remaining PCs as task processor peers. To assess framework we develop two JAVA applications- search prime numbers from 1 to any given number and find sum of all numbers between 1 to any given number. In our first experiment we run application search for prime number between 1 and 500000. This range is broken into sub-ranges of equal size. The time is measured from the moment the task distributor start to distributes sub-ranges to the task processors to the moment task distributor receives result of last sub-range. The experiment is repeated for 1, 3, 5, and 10 task processor peers. The Table 1 shows the result of search prime number application experiment. With same setup we conduct another experiment to find sum of all numbers between 1 and 1000000. This experiment is also repeated for 1, 3, 5, and 10 task processors. The table 2 shows the result of second experiment.

The graph in fig. 7 shows that as the number of task processors increases, time required to complete the task is decrease when computing large task. The graph in fig. 8 shows that performance of framework for small computations is very poor as the number of task processors increases. For small computation, processing time increases as number of

Page 5: [IEEE 2012 Ninth International Conference on Wireless and Optical Communications Networks - (WOCN) - Indore, India (2012.09.20-2012.09.22)] 2012 Ninth International Conference on Wireless

task processors are increases because communication overheads higher as compare to computation for small task.

Table 1

Task No. of Task Processors

Time taken to complete task

Search for all the prime numbers between 1 and 5,00,000 01 14.58 minutes

Search for all the prime numbers between 1 and 5,00,000 03 7.47 minutes

Search for all the prime numbers between 1 and 5,00,000 05 07 minutes

Search for all the prime numbers between 1 and 5,00,000 10 6.52 minutes

Figure 7. Performance of framework for search prime number application

Table 2

Task No. of Task Processors

Time taken to complete task

Find sum of numbers between 1 and 1000000. 01 2.04 minutes

Find sum of numbers between 1 and 1000000. 03 1.30 minutes

Find sum of numbers between 1 and 1000000. 05 1.34 minutes

Find sum of numbers between 1 and 1000000. 10 1.35 minutes

Figure 8. Performance of framework for Sum of numbers between 1 and

1000000.

VI. CONCLUSION The proposed framework has all the features required for

reliable, scalable, efficient, and dynamic general purpose pure P2P computing system. The communication in the framework

limits between peer groups only. This enables the framework to scale up to large number of peers. The reliability is one of the important requirements in distributed computing. As given earlier, the fact that peers in any peer group are spreaders over different geographical locations. The lost of peers from a single geographic locations will not affect the functionalities of the framework. Hence, improve the reliability of the framework. If the member peers of a peer group are spread over different geographic areas, it has a positive effect on the resource availability but a negative effect in form of transmission delay. In proposed framework economical aspects in assigning tasks to different workers are not considered. The framework is limited to process only coarse-grained parallel tasks. The framework is suitable for large computations. For small computation communication overheads are bigger as compare to computation time.

REFERENCES [1] Jigyasu Dubey, Dr. (Mrs.) Vrinda Tokekar, Anand Rajavat, “A Study

of P2P Computing Networks”, Published in Proceedings of International Conference on Computer Engineering and Technology (ICCET’10), pp 623-627, 13-14 Nov. 2010, Jodhpur, India.

[2] Zhikun ZHAO, Feng YANG, Yinglei XU, “PPVC: A P2P Volunteer Computing System” Published in proceedings of 2nd IEEE International Conference on Computer Science and Information Technology, 2009. ICCSIT, pp 51-55, 2009.

[3] distributed.net: Project RC5, RSA Labs Cryptographic Challenge. http://www.distributed.net/rc5/

[4] D. Anderson, J. Cobb, E. Korpela, M. Lebofsky, and D. Werthimer. “SETI@home: An Experiment in public-resource computing”, Communications of the ACM, pp 45:56–61, 2002.

[5] Korpela, E., Werthimer, D., Anderson, D., Cobb, J., and Lebofsky, “SETI@home: Massively Distributed Computing for SETI,” In Journal on Computing in Science and Engineering, Volume 3, Issue 1, pp. 78 – 83, 2001.

[6] http://www.napster.com/ [7] http://www.bittorrent.com/ [8] Jerome Verbeke, Neelakanth Nadgir, Greg Ruetsch, Ilya Sharapov,

“Framework for Peer-to-Peer Distributed Computing in a Heterogeneous, Decentralized Environment,” In Proceedings of the 3rd International Workshop on Grid Computing, pp.1–12, Year 2002.

[9] Jean-Baptiste Ernst-Desmulier, Julien Bourgeois and Francois Spies, Jerome Verbeke, “Adding New Features In A Peer-to-Peer Distributed Computing Framework,” In Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing (Euromicro-PDP’05), pp.34 – 41, 2005.

[10] N. A. Al-Dmour, W. J. Teahan, “ParCop: A Decentralized Peer-to-Peer Computing system”, Published in Proceedings of Third International Symposium on/Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks, 2004. pp 162 – 168.

[11] Virginia Lo, Daniel Zappala, Dayi Zhou, Yuhong Liu, and Shanyu Zhao, “Cluster Computing on the Fly: P2P Scheduling of Idle Cycles in the Internet”, In Proc. of 3rd International Workshop on Peer-to-Peer System (IPTPS 2004). San Diego,Feb. 2004.

[12] Jigyasu Dubey, Dr. (Mrs.) Vrinda Tokekar, “P2PCS – A Pure Peer-to-Peer Computing System for Large Scale Computation Problems”, in proceedings of IEEE International conference on Computational Intelligence and Communication Networks (CICN), 07 - 09 Oct 2011, pp- 582 – 585, MIR Labs Gwalior, India

[13] Jigyasu Dubey, Vrinda Tokekar, “Identification of Reliable Peer group in Peer-to-Peer Computing Systems”, Published in Proceedings of Third International Conference on Advances in Communication, Network, and Computing CNC 2012, LNICST pp. 233–237, 2012