7
Future Generation Computer Systems 23 (2007) 116–122 www.elsevier.com/locate/fgcs A global and parallel file system for grids elix Garc´ ıa-Carballeira * , Jes´ us Carretero, Alejandro Calder ´ on, J. Daniel Garc´ ıa, Luis. M. Sanchez Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganes, Madrid, Spain Received 20 December 2005; received in revised form 18 April 2006; accepted 7 June 2006 Available online 2 August 2006 Abstract Data management is one of the most important problems in grid environments. Most of the efforts in data management in grids have been focused on data replication. Data replication is a practical and effective method to achieve efficient data access in grids. However all data replication schemes lack in providing a grid file system. One important challenge facing grid computing is the design of a grid file system. The Global Grid Forum defines a Grid File System as a human-readable resource namespace for management of heterogeneous distributed data resources, that can span across multiple autonomous administrative domains. This paper describes a new Grid File System according to the Global Grid Forum recommendations that integrates heterogeneous data storage resources in grids using standard grid technologies: GridFTP and the Resource Namespace Services, both defined by the Global Grid Forum. To obtain high performance, we apply the parallel I/O techniques used in traditional parallel file systems. c 2006 Elsevier B.V. All rights reserved. Keywords: Data grids; Parallel I/O; Data declustering; High performance I/O; GridFTP; RNS 1. Introduction Currently there is great interest in the grid computing con- cept. Usually this concept denotes a distributed computational infrastructure in the field of engineering and advanced science [8]. A grid is composed by geographically sparse resources that join to form a virtual computer. The resources (computers, net- works, storage devices, etc.) that define the grid are heteroge- neous and reside in differentiated domains. This kind of system differs from other distributed environments such as clusters or local area networks in several aspects: (1) They are located in several administration domains. (2) The communication network used is the Internet. This feature allows us to build a grid with resources placed, for example, in Europe, America or Asia. (3) The different resources of the grid have a high degree of heterogeneity and must be accessible from any other part of the grid. * Corresponding author. Tel.: +34 916249060; fax: +34 916249129. E-mail address: [email protected] (F. Garc´ ıa-Carballeira). URL: http://arcos.inf.uc3m.es (F. Garc´ ıa-Carballeira). (4) The grid must be transparent to the users. They should not know where their programs are being executed or their data are being stored. Many applications in grids require the access to large amounts of data. For this kind of application, most of the efforts in data management in grids have been focused on data replication. Data replication is a practical and effective method to achieve efficient data access. File replicas are multiple copies of a file spread across the grid used for improving data access. However all data replication schemes provided in the literature lack in providing a global file system for accessing files in a data grid environment. One important challenge facing grid computing is a true and global file system for grid applications running in different administrative domains. The Global Grid Forum [11] defines a Grid File System as a human-readable resource namespace for management of heterogeneous distributed data resources that can span across multiple autonomous administrative domains, that can include: A logical resource namespace across multiple administrative domains. Standard interfaces. A virtual namespace of a WAN file system. Independence of physical data access/transport, authentica- tion mechanisms. 0167-739X/$ - see front matter c 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2006.06.004

A global and parallel file system for grids

Embed Size (px)

Citation preview

Page 1: A global and parallel file system for grids

Future Generation Computer Systems 23 (2007) 116–122www.elsevier.com/locate/fgcs

A global and parallel file system for grids

Felix Garcıa-Carballeira∗, Jesus Carretero, Alejandro Calderon, J. Daniel Garcıa, Luis. M. Sanchez

Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganes, Madrid, Spain

Received 20 December 2005; received in revised form 18 April 2006; accepted 7 June 2006Available online 2 August 2006

Abstract

Data management is one of the most important problems in grid environments. Most of the efforts in data management in grids have beenfocused on data replication. Data replication is a practical and effective method to achieve efficient data access in grids. However all data replicationschemes lack in providing a grid file system. One important challenge facing grid computing is the design of a grid file system. The Global GridForum defines a Grid File System as a human-readable resource namespace for management of heterogeneous distributed data resources, thatcan span across multiple autonomous administrative domains. This paper describes a new Grid File System according to the Global Grid Forumrecommendations that integrates heterogeneous data storage resources in grids using standard grid technologies: GridFTP and the ResourceNamespace Services, both defined by the Global Grid Forum. To obtain high performance, we apply the parallel I/O techniques used in traditionalparallel file systems.c© 2006 Elsevier B.V. All rights reserved.

Keywords: Data grids; Parallel I/O; Data declustering; High performance I/O; GridFTP; RNS

1. Introduction

Currently there is great interest in the grid computing con-cept. Usually this concept denotes a distributed computationalinfrastructure in the field of engineering and advanced science[8]. A grid is composed by geographically sparse resources thatjoin to form a virtual computer. The resources (computers, net-works, storage devices, etc.) that define the grid are heteroge-neous and reside in differentiated domains. This kind of systemdiffers from other distributed environments such as clusters orlocal area networks in several aspects:

(1) They are located in several administration domains.(2) The communication network used is the Internet. This

feature allows us to build a grid with resources placed, forexample, in Europe, America or Asia.

(3) The different resources of the grid have a high degree ofheterogeneity and must be accessible from any other part ofthe grid.

∗ Corresponding author. Tel.: +34 916249060; fax: +34 916249129.E-mail address: [email protected] (F. Garcıa-Carballeira).URL: http://arcos.inf.uc3m.es (F. Garcıa-Carballeira).

0167-739X/$ - see front matter c© 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.future.2006.06.004

(4) The grid must be transparent to the users. They should notknow where their programs are being executed or their dataare being stored.

Many applications in grids require the access to largeamounts of data. For this kind of application, most of theefforts in data management in grids have been focused on datareplication. Data replication is a practical and effective methodto achieve efficient data access. File replicas are multiple copiesof a file spread across the grid used for improving data access.However all data replication schemes provided in the literaturelack in providing a global file system for accessing files ina data grid environment. One important challenge facing gridcomputing is a true and global file system for grid applicationsrunning in different administrative domains.

The Global Grid Forum [11] defines a Grid File Systemas a human-readable resource namespace for management ofheterogeneous distributed data resources that can span acrossmultiple autonomous administrative domains, that can include:

• A logical resource namespace across multiple administrativedomains.

• Standard interfaces.• A virtual namespace of a WAN file system.• Independence of physical data access/transport, authentica-

tion mechanisms.

Page 2: A global and parallel file system for grids

F. Garcıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122 117

Furthermore, a grid file system must provide high perfor-mance data access. Parallel I/O has been the traditional tech-nique for improving data access in clusters and multiproces-sors. Parallelism in file systems is obtained by using several in-dependent servers and striping data among these nodes to allowparallel access to files. There are many parallel file systems forthese kinds of platforms; however very few parallel file systemshave been provided for grid environments.

The main contribution of this paper is to describe a grid filesystem based on the Expand Parallel File System [12] designedby the authors. Expand is a parallel file system for clustersthat uses standard servers and protocols for building paralleland distributed partitions where files are distributed. We haveextended this parallel file system for providing a parallel filesystem for grids. The new version of Expand for grids usesstandard grid technologies: GridFTP for data access, and theResource Namespace Service for naming. This system allowsthe integration of existing heterogeneous servers that spanacross multiple administrative domains in grids for providingparallel I/O services using standard interfaces like POSIX andMPI-IO. The new version of this parallel file system has beenimplemented using Globus [10], one of the most importantmiddlewares for building grid applications.

The rest of the paper is organized as follows: Section 2presents the related works. Section 3 describes the standardgrid technologies used for implementing the new version ofExpand. Section 4 presents the main design aspects of Expandfor grids. Section 5 shows some evaluation results. Finally,Section 6 presents some conclusions and the future work.

2. Related work

In the context of replication, most of the works havebeen focused on data availability and high performance dataaccess. File level and dataset level replication and replicamanagement have been studied in several works [17,19,23,2,22]. The use of replication originates two main problems: itsupposes an intensive use of resources, not only in storage,but also in management. Furthermore, it is not appropriate forapplications that modify the same set of information, becausesome resources must be used in collaborative environments.A way to improve the performance of I/O is parallel I/O.The use of parallelism in file systems is based on the factthat a distributed and parallel system consists of several nodeswith storage devices. Parallelism in file systems is obtainedusing several independent server nodes supporting one or moresecondary storage devices. Data are striped among these nodesand devices to allow parallel access to different files, andparallel access to the same file. The usage of parallel filesystems and parallel I/O libraries has been studied in a greatnumber of systems and platforms: Vesta [6], ParFiSys [5],Galley [14], PVFS [4], GPFS [20], and MAPFS [16]. Armada[15] is a parallel file system for computational grids, but doesnot use standard grid services.

There are several high-performance file systems forsupporting more than a thousand clients such as for exampleLustre [13] and the Google File System [9]. GFARM [24] is

a file system designed for file sharing and high-performancedistributed and parallel data computing in a Grid acrossadministrative domains. However, GFARM implements a I/Oserver running on every file system node and does not usesstandard grid services.

3. Grid technologies used

The main objective of the Grid parallel file system describedin this paper is to use standard grid technologies. Usingstandard technologies allows an easy deployment of the parallelfile system in practically any grid environment. With thisaim, we use services defined by the Global Grid ForumRecommendation, one of them implemented in the GlobusToolkit. We use two main technologies for the grid parallelfile system: the GridFTP protocol for transferring data, and theResource Namespace Service (RNS) for building the directoryservice. The following sections describe these elements.

3.1. GridFTP

GridFTP [1,10] is a data transfer protocol defined by theGlobal Grid Forum Recommendation, that provides secure andhigh performance data movement in grid systems. The GlobusToolkit provides the most commonly used implementationof that protocol, though others do exist (primarily tiedto proprietary internal systems). This protocol extends thestandard FTP protocol and includes the following features:

• Grid Security Infrastructure (GSI) support.• Third-party control and data transfer.• Parallel data transfer using multiple TCP streams.• Striped data transfer using multiple servers.• Partial file transfer and support for reliable and restartable

data transfer.

The transfer data in our parallel file system is based onthis protocol and the implementation provided by the GlobusToolkit. The access to this protocol in Globus is provided viatwo libraries: the Globus FTP control library, and the GlobusFTP client library. The first library provides low-level servicesneeded to implement FTP client and servers. The API providedby this library is protocol specific. The Globus FTP Clientlibrary provides a convenient way of accessing files on remoteFTP servers.

3.2. Resource namespace service

The Resource Namespace Service (RNS) is a specificationof the Grid File System Working Group (GFS-WG) of theGlobal Grid Forum that allows the construction of a uniform,global, hierarchical namespace. It is a web service described bya RNS WSDL [18].

The GFS-WG proposes a RNS profile for use with Grid FileSystems. RNS is a three-tier naming architecture (see Fig. 1)which consists of human interface names, logical referencenames, and endpoint references. RNS allows two levels ofindirection. The first level is realized by mapping humaninterface names directly to endpoint references. The second

Page 3: A global and parallel file system for grids

118 F. Garcıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122

Fig. 1. RNS three-tier naming architecture.

level of indirection may be appreciated when mapping humaninterface names to logical references or logical names, whichin turn map logical names to logical references and hence thesecond level of indirection. This second level of indirection hasthe advantage of using a logical name to represent a logicalreference, and therefore, logical names may be referenced andresolved independently of the hierarchical namespace. Thismeans that logical names may be used as a globally uniquelogical resource identifier and be referenced directly by boththe RNS namespace as well as other services.

RNS is composed of two main components: virtualdirectories and junctions. Directories are virtual because theydo not have any corresponding representation outside of thenamespace. A junction is an RNS entry that interconnects areference to an existing resource into the global namespace.There are two types of junction, referral junction and logicalreference junction. The first is used to graft RNS namespaces.A logical reference junction contains a unique logical name.For this type of junction, the RNS service returns an endpointreference (EPR). This endpoint is resolved using the ResourceEndpoint Resolution Service or RNS resolver.

RNS is composed of several types of operations: operationsfor querying namespace entry information; operations forcreating, removing, renaming, and updating entries; andoperations for managing properties or status of an entry.

The RNS resolver is a service independent of RNS that hasthe mapping between logical names and endpoint references oraddress. For each logical name, the RNS resolver may storeseveral endpoint references. The RNS resolver has operationsfor resolving names to endpoint references, and operations forcreating, removing, and updating logical references.

We have implemented a prototype of web service for Globusthat incorporates the RNS specification and that has been usedin Expand for providing a global namespace in grids.

4. Expand design for grid environments

To provide a global and parallel file system for gridcomputing, the authors have modified the Expand Parallel Filesystem for using the standard grid technologies described in theprevious sections: GridFTP protocol and RNS.

Fig. 2. Expand architecture for grid environments.

Fig. 2 shows the architecture of Expand. This figure showshow Expand can be used for data management in clusterenvironments and how it can be used to access several sitesusing the GridFTP protocol. File data are striped by Expandamong several servers using different protocols, using blocksof different sizes as the stripping unit. Processes in clientsuse an Expand library to access to an Expand distributedpartition. Expand offers an interface based on POSIX systemcalls. This interface, however, is not appropriate for parallelapplications using striped patterns with small access size [14].For parallel applications, we use ROMIO [21] to support MPI-IO interface [3], implementing the appropriate Expand ADIO.

The next sections describe data distribution, file structure,naming, metadata management, and parallel access to filesin Expand, using the GridFTP protocol and the RNS servicespecification.

4.1. Data distribution and files

Expand combines several GridFTP servers (see Fig. 2)in order to provide a generic distributed partition. The useof GridFTP allows us to use servers allocated in differentadministrative domains. Each server provide one or moredirectories that are combined to build a distributed partition

Page 4: A global and parallel file system for grids

F. Garcıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122 119

Fig. 3. The file structure and directory mapping in Expand.

through the grid. All files in the system are striped across allGridFTP servers to facilitate parallel access, with each serverstoring conceptually a subfile of the parallel file.

A file consists of several subfiles, one for each GridFTPserver. All subfiles are fully transparent to the Expand users.On a distributed partition, the user can create striped fileswith cyclic layout. In these files, blocks are distributed acrossthe partition following a round-robin pattern. This structure isshown in Fig. 3.

4.2. Naming and metadata management

Partitions in Expand are defined using a small configurationfile. For example, the following configuration file defines twopartitions:

/xpn1 8 4gsiftp://host1/export/home1gsiftp://host2/export/home2gsiftp://host3/export/home3gsiftp://host4/home

/xpn2 4 2nfs://server1/usersnfs://server2/export/home/users

This configuration file defines two Expand partitions. Thefirst partition uses 4 servers (host1, host2, host3, and host4)and it uses by default a striping unit of 8 kB. The partitionin this case uses GridFTP protocol, and the Globus Toolkitfor accessing to the files located in different sites. The secondpartition uses 2 NFS servers and it uses a striping unit of 4 kB.This type of partition is appropriate for cluster data access. Thepath /xpn1 is the root path for the first partition, and /xpn2is the root path for the second partition. So, the Expand file/xpn1/dir/data.txt is mapped in the following subfiles:

gsiftp://host1/export/home1/dir/data.txtgsiftp://host2/export/home2/dir/data.txtgsiftp://host3/export/home3/dir/data.txtgsiftp://host4/home/dir/data.txt

Each subfile of a Expand file (see Fig. 3) has a small headerat the beginning of the subfile. This header stores the file’smetadata. This metadata includes the following information:stripe size, base node, that identifies the NFS server where thefirst block of the file resides and the file distribution patternused. By the moment, we only use files with cyclic layout.

All subfiles have a header for metadata, although only onenode, called master node (described below) stores the currentmetadata. The master node can be different from the basenode. To simplify the naming process and reduce potentialbottlenecks, Expand does not use any metadata manager, as thatused in PVFS [4]. Fig. 3 shows how directory mapping is madein Expand. The naming service is provided by a prototype webservice that incorporates the RNS specification. Fig. 4 show thenaming and the data access process.

The metadata of a file resides in the header of a subfile storedin a GridFTP server. This server is the master node of the file,similar to the mechanism used in the Vesta Parallel File System[6]. To obtain the master node of a file we use the mechanismimplemented in Expand, the file name is hashed into the numberof the node.

The hash function used in the current prototype is:

Server(name file)

=

(i=strlen(name file)∑

i=1

name file[i]

)mod numServers

Because the determination of the master node is based on thefile name, when a user renames a file, the master node for thisfile is changed. The algorithm used in Expand to rename a fileis the following:

rename(oldname, newname) {oldmaster = hash(oldname)newmaster = hash(newname)move the metadata from oldmaster to newmaster

}

Expand offers two different interfaces. The first interfaceis based on POSIX system calls. This interface, however, isnot appropriate for parallel applications using stripe patterns

Page 5: A global and parallel file system for grids

120 F. Garcıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122

Fig. 4. Naming and data access process.

with small access size [14]. Parallel applications can also usedExpand with MPI-IO [3]. Expand has been integrated insideROMIO [21] and can be used with MPICH. Portability inROMIO is achieved using an abstract-device interface for IO(ADIO).

4.3. Parallel access and authentication

All file operations in Expand use a virtual filehandle. Thisvirtual filehandle is the reference used in Expand to referenceall operations. When Expand needs to access a subfile, ituses the appropriated filehandle. For GridFTP Expand usesthe appropriate handle managed by the GridFtp Client Libraryprovided by Globus. To enhance I/O, user requests are splitby the Expand library into parallel subrequests sent to theinvolved servers. When a request involves k GridFTP servers,Expand issues k requests in parallel to the servers, using threadsto parallelize the operations. The same criteria is used in allExpand operations. A parallel operation to k servers is dividedin k individual operations that are provided by the GlobusGridFTP Client Library to access the corresponding subfile.This process is show in Fig. 4.

The access control and authentication is guaranteed inExpand for grid environments, because GridFTP uses theGrid Security Infrastructure provided by Globus. GSI usespublic key cryptography as the basis for its functionality andprovides secure communication (authenticated and perhapsconfidential) between elements of a computational Grid,security across organizational boundaries, thus prohibiting acentrally-managed security system, and support single sign-onfor users of the Grid, including delegation of credentials forcomputations that involve multiple resources and/or sites.

5. Performance evaluation

The main motivation for our performance test is to study thefeasibility of using grid services for implementing a parallelfile system. With this aim, the evaluation has been made intwo scenarios. The first scenario evaluates Expand with gridservices in a typical grid computing environment. The secondtries to analyze the use of this system for parallel applicationsusing MPI running in a cluster.

For analyzing the system in a typical grid scenario we havedefined a grid benchmark that consists of 500 jobs scheduledon 4 workstations. Each job accesses a random number of files(between 1 and 10 files) chosen from among 1000 files. Thesize of each file is 500 MB. This benchmark has been tested indifferent systems:

• All files are stored in one GridFTP server (1 Site GridFTP inthe figure) and they are accessed using the globus url copy,the command line tool provided by Globus.

• The files are distributed among 4 GridFTP servers (4 sitesDistributed Replicas in the figure). Each server store 250files and they are accessed using the globus url copy.

• All files are replicated in the 4 GridFTP servers (4 sites Fullreplication in the figure). Each server stores 1000 files thatare accessed sequentially using the globus url copy.

• All files are replicated in the 4 GridFTP servers (4 sites Fullreplication-parallel access in the figure). Each server stores1000 files, but in this case each file is accessed in parallelusing the 4 servers.

• Using Expand with GridFTP protocol (GridExpand infigures) and several number of servers for the distributedpartition (1, 2, and 4). The files are accessed using POSIXsystem calls.

For the second scenario, we have used the FLASH I/Obenchmark [7]. This benchmark uses the parallel HDF5

Page 6: A global and parallel file system for grids

F. Garcıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122 121

Fig. 5. Performance results for the grid benchmark.

Fig. 6. Performance results for the Flash-IO benchmark.

interface that uses in turn the MPI-IO interface. The FLASHI/O benchmark performs three separate performance tests:checkpoint, plotfile without corners, and plotfile with corners.This benchmark is intensive in write operations and allows usto analyze file system behaviour. For this benchmark, we havecompared the performance of Expand with GridFTP, and NFSand the performance of PVFS.

In the evaluation we have used 4 workstations for runningthe processes and 4 workstations for the GridFTP servers, all ofthem with the Globus Toolkit 4 installed.

Fig. 5 shows the results obtained for the Grid Benchmark.The best results are obtained for 4 sites Full replication-parallelaccess and Expand for 4 servers. This result shows how theparallel I/O is an effective technique for improving data accessin grids. Furthermore, Expand does not require full replicationfor obtaining these results.

Fig. 6 shows the performance obtained for the Flash-IO benchmark for 4 and 8 processes. As we can see theperformance is better for Expand (XPN-NFS) using NFSprotocol and PVFS, than that obtained for Expand usingthe GridFTP protocol (XPN-GridFTP). Although GridFTPdoes not provide good results for this benchmark, howeverit demonstrates that it is feasible to use this approach forproviding a grid file system. The use of this system is

appropriate for building grid file systems spanning acrossmultiple administrative domains. For this kind of environment,Expand with NFS and PVFS cannot be applied.

6. Conclusions and future work

In this paper we have described a new parallel file systemfor grids according to the Global Grid Forum Recommenda-tions. This system is based on the Expand Parallel File Systemdesigned by the authors. The new version of Expand for gridsuses standard grid technologies: GridFTP for data access, andthe Resource Namespace Service for naming, and allows theintegration of existing heterogeneous servers spanning acrossmultiple administrative domains in grids for providing parallelI/O services using standard interfaces like POSIX and MPI-IO.The performance results demonstrate that it is feasible to usethis system for providing a grid file system. Future work is go-ing on into fault tolerant support and the study of new schemesfor data allocation and prefetching algorithms for data grids.

Acknowledgment

This work has been supported by the Spanish Ministry ofEducation and Science under TIN2004-02156 contract.

References

[1] B. Allock, J. Bester, J. Bresnahn, A. Chervenak, I. Foster, C. Keseelman,S. Meder, V. Nefedova, D. Quesnel, S. Tuecke, Secure, efficient datatransport and replica management for high performance data-intensivecomputing, in: Proceedings of the Eighteenth IEEE Symposium onMass Storage Systems and Technologies-volume 00, April 17–20, 2001,pp. 13–28.

[2] H. Bell, D.G. Cameron, L. Capozza, A.P. Millar, K. Stockinger, F. Zini,OptorSim — A Grid simulator for studying dynamic data replicationstrategies, International Journal of High Performance ComputingApplications 17 (4) (2003).

[3] A. Calderon, F. Garcia, J. Carretero, J.M. Perez, J. Fernandez, Animplementation of MPI-IO on Expand: A parallel file system based onNFS servers, in: 9th PVM/MPI European Users Group, Johannes KeplerUniversity Linz, Austria, September 29–October 2, 2002, pp. 306–313.

[4] P.H. Carns, W.B. Ligon III, R.B. Ross, R. Takhur, PVFS: A parallel filesystem for linux clusters, Tech. Rep. ANL/MCS-P804-0400, 2000.

Page 7: A global and parallel file system for grids

122 F. Garcıa-Carballeira et al. / Future Generation Computer Systems 23 (2007) 116–122

[5] J. Carretero, F. Perez, P. de Miguel, F. Garcia, L. Alonso, Performanceincrease mechanisms for parallel and distributed file systems, in: ParallelComputing: Special Issue on Parallel I/O Systems, no. 3, Elsevier, 1997,pp. 525–542.

[6] P. Corbett, S. Johnson, D. Feitelson, Overview of the Vesta parallel filesystem, ACM Computer Architecture News 21 (5) (1993) 7–15.

[7] FLASH I/O benchmark routine — Parallel HDF 5,http://flash.uchicago.edu/˜zingale/flash benchmark io/.

[8] I. Foster, C. Kesselman (Eds.), The Grid: Blueprint for a New ComputingInfrastructure, Morgan Kaufmann, 1999.

[9] S. Ghemawat, H. Gobioff, S. Leung, The Google file system, in:Proceedings of the 19th ACM Symposium on Operating SystemsPrinciple, 2003.

[10] The Globus Toolkit, http://www.globus.org.[11] Global Grid Forum, The GGF file system architecture workbook,

http://www.globalgridfoum.net, 2002.[12] F. Garcia, A. Calderon, J. Carretero, J.M. Perez, J. Fernandez, The

design of the Expand parallel file system, International Journal of HighPerformance Computing Applications 17 (1) (2003) 21–37.

[13] Cluster File Systems, Inc., Lustre: A scalable, high-performance filesystem, http://www.lustre.org.

[14] N. Nieuwejaar, D. Kotz, The Galley parallel file system, in: Proceedings ofthe 10th ACM International Conference on Supercomputing, May 1996.

[15] R. Olfield, D. Kotz, Armada: A parallel file system for computationalGrids, in: International symposium on cluster computing and the grid,Brisbane, Australia, May 2001, IEEE Computer Society Press, 2001,pp. 194–201.

[16] M.S. Perez, J. Carretero, F. Garcia, J.M. Pena, V. Robles, MAPFS: Aflexible multiagent parallel file systems for clusters, Future GenerationComputer Systems 22 (2006) 620–632.

[17] K. Ranganathan, I. Foster, Identifying dynamic replication strategies forhigh performance data grids, in: Proceedings of International Workshopon Grid Computing, Denver, November 2002, pp. 75–86.

[18] M. Pereira, O. Tatebe, L. Luan, T. Anderson, J. Xu, Re-source Namespace Service specification, November, 2005,http://www.global.http://www.globalgridforum.net.

[19] A.S. Tosun, H. Ferhatosmanoglu, Optimal parallel I/O using replication,in: Proceedings of International Workshop on Parallel Processing, ICPP,Vancouver, Canada, 2002, pp. 506–513.

[20] F. Schmuck, R. Haskin, GPFS: A shared-disk file system for largecomputing clusters, in: Proceedings of the Conference on File andStorage Technologies, FAST’02, 28–30 January 2002, Monterey, CA, pp.231–244.

[21] W. Gropp, R. Takhur, E. Lusk, An abstract-device interface forimplementing portable parallel-I/O interfaces, In: Proceedings of the 6thSymposium on the Frontiers of Massively Parallel Computation, October1996, pp. 180–187.

[22] M. Tang, B.-S. Lee, X. Tang, C. Ueo, The impact of data replication on jobscheduling performance in the Data Grid, Future Generation ComputerSystems 22 (2006) 254–268.

[23] O. Tatebe, Worldwide fast file replication on Grid datafarm, in:Proceedings of the 2003 Computing in High Energy and Nuclear Physics,CHEP03, March 2003.

[24] O. Tatebe, N. Soda, Y. Morita, S. Matsuoka, S. Sekiguchi, Gfarm v2: AGrid file system that supports high-performance distributed and paralleldata computing, in: Proceedings of the 2004 Computing in High Energyand Nuclear Physics, CHEP04, Switzerland, September 2004.

Felix Garcıa-Carballeira received the M.S. degreein Computer Science in 1993 at the UniversidadPolitecnica de Madrid, and the Ph.D. degree inComputer Science in 1996 at the same university.From 1996 to 2000 was an associate professor inthe Department of Computer Architecture at theUniversidad Politecnica de Madrid. He is currentlyan associate professor in the Computer ScienceDepartment at the Universidad Carlos III de Madrid.

His research interests include high performance computing and parallel filesystems. He is coauthor of 9 books and he has published some 70 articles injournals and conferences.

Jesus Carretero got his Computer Science degree andhis Ph.D. at the Universidad Politecnica de Madrid.Since 1989, he has been teaching Operating Systemsand Computer Architecture in several universities.During 1997 and 1998, he had a visiting scholarposition at the Northwestern University, in Chicago.He is a full professor at the Universidad Carlos IIIde Madrid, Spain, since 2001. His research interestis focused on Parallel and Distributed Systems,

especially data storage systems, Real-Time Systems and MultimediaTechniques. He is author of several educational books and he has publishedpapers in several major journals in this area, such as, for example, ParallelComputing and Journal of Parallel and Distributed Computing.

Alejandro Calderon got his M.S. in ComputerScience at the Universidad Politecnica de Madridin 2000 and his Ph.D. in 2005 at the UniversidadCarlos III de Madrid. He is an associate professorin the Department of Computer Science at theCarlos III University of Madrid, Spain. His researchinterests include high performance computing andparallel file systems. Alejandro has participatedin the implementation of MiMPI, a multithread

implementation of MPI, and Expand parallel file system.

J. Daniel Garcıa got his Computer Science degreeat the Universidad Politecnica de Madrid in 2001and his Ph.D. in 2005 at the Universidad Carlos IIIde Madrid. He is an assistant professor since 2002at the Universidad Carlos III de Madrid teachingComputer Architecture and Operating Systems. Hisresearch interest is focused on Parallel and DistributedSystems, especially data storage systems, and Real-Time Systems.

Luis. M. Sanchez got his Computer Science degreeat the Universidad Carlos III de Madrid in 2003.He is an assistant professor since 2003 at theUniversidad Carlos III de Madrid teaching ComputerArchitecture and Operating Systems. His researchinterest is focused on Parallel and Distributed Systems,especially data storage systems.