12
n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 1

StoRM latest performance test results

Alberto Forti

Otranto, Jun 8 2006

Page 2: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 2

StoRM : Storage Resource Manager

Description : StoRM is a disk based storage resource manager. It implements the SRM v.2.1.1 standard interface. StoRM is designed to support guaranteed space reservation and direct access (native posix I/O call), as well as other I/O libraries like RFIO. Security aspects are based on user identity (VOMS certificate).

StoRM is designed for taking advantage from high performance distributed file systems like GPFS. Also standard POSIX file systems are supported, like ext3 and XFS, and new plug-ins could be easily developed.

Page 3: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 3

StoRM: Functionalities SRM Interface V.2.1.1

Dynamic management of (disk) storage resources (files and space):

Introduces concepts of lifetime of a file (volatile with a fixed amount of life-time, durable or permanent), file pinning (to ensure a file is not cancelled during operation), space pre-allocation (to ensure the requested disk space is available for the whole life of the application since the beginning).

Files are no longer permanent entities on the storage, but dynamical ones that can appear and disappear according to the user’s specifications (when life-time expires it is available to a garbage collector for deletion without further notice).

More relevant functionalities (already implemented and tested in StoRM): Data Transfer:

srmPrepareToPut() creates a file and on request allocates disk space. srmPrepareToGet() pins a file and forbids deletion by the SRM. srmCopy() asynchronously, creates a file on one side, pins it on the other side, and

execute the transfer srm*Status() check the status of asynchronous operations submitted. srmPutDone() tells the SRM that the file has been written and then can be deleted if e.g.

lifetime expires. Space Management: srmReserveSpace(), srmGetSpaceMetaData().

To allocate big chunks of disk space to be subsequently used by independent files (similar to allocating space for a single file, but done at once).

Directory functions : srmLs() with recursive option, srmRm(), srmRmDir(), srmMkDir().

Page 4: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 4

StoRM – Grid scenario

WNWN

WN

DataManagement

Access

CE

Job Submission

GPFSWN

Direct Access

SECE

GridFTP

I/O server

GridFTP

StoRM

Site A Site B

Page 5: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 5

Simple minded use case (I)

Analysis job without StoRM (i.e. without SRM v2.1): The job locates the input data physical path on the SE (i.e. via file catalogues).

The job copies (e.g. via GridFTP if remote SE or RFIO if local SE) the input datasets from the SE onto the local disk of the WN.

The job processes the dataset and writes output data onto the local disk.

The job copies the locally produced data (e.g. ntuples) back to the SE.

DRAWBACKS! if the local disk gets full during the job lifetime (e.g. other jobs running on the

same WN exhaust the available space) the job will fail.

If the SE fills up (or the quota is over) during the job execution data cannot be copied back, and the job will fail.

No dynamical usage of the SE disk space is allowed, files should be taken permanently resident on the disk, and just cleaned by some central administrator or the owner him/herself at some point.

Page 6: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 6

Simple minded use case (II) Analysis with StoRM (i.e. with SRM v2.1)

The job locates the input data physical path on the SE (i.e. via file catalogues). In case the file is already on the local SE:

The job pins the file on the SE through a SRM prepare to get call, to ensure it is not cancelled during job operation.

In case the file is available from a remote SE: The job executes a srmCopy v2 call to transfer the data from the remote SE to the local

SE (this assuming that a local disk-based cache storage is always available) assigning to the local copy a fixed lifetime according to the estimated lifetime of the job itself.

The job opens the input files on the local SE (using UNIX-like, i.e. POSIX system calls – no need of additional protocols embedded in the application).

The job creates output files for writing to the local SE (SRM prepare to put call), pre-allocating the estimated size of disk space for the job output, and opens them.

The job processes the input datasets and writes the output data files from/to the local SE.

The job unpins the input data files and the output data files (releaseFile and PutDone SRM v2 calls).

It does not need to rely on the availability of the local disk, and no further copies are needed.

NB: More advanced scenarios include higher layers as FTS for data transfer SRM v2 is not a replacement of FTS!

Page 7: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 7

Early StoRM testbed at CNAF T1 (pre-Mumbai)C

lust

er G

PF

S

StorageTEK FLexLine FLX680

20 x 2 TB (40 TB)

SAN Fabric

Storage servers GPFS and GridFTPd

StoRM

Ethernet

Worker Node

4 x 2 Gb/s FC

8 x 2 Gb/s FC

4 x Sun Microsystem SUnFire V20Z

Qlogic 2340 HBA

CE

Worker Nodes mounting GPFS Framework: The disk storage was

composed by roughly 40 TB, provided by 20 logical partitions of one dedicated StorageTEK FLexLine FLX680 disk array storage, aggregated by GPFS.

Write test: srmPrepareToPut() with implicit reserveSpace of 1GB files. globus-url-copy from local source to the returned TURL. 80 simultaneous client processes.

Read test: srmPrepareToGet() followed by globus-url-copy from the returned TURL to a local file (1 GB files). 80 simultaneous client processes.

Results:

Sustained read and write throughputs measured : 4 Gb/s and 3 Gb/s respectively.

The two tests are meant to validate the functionality and robustness of the srmPrepareToPut() and srmPrepareToGet() functions provided by StoRM, as well as to measure the read and write throughputs of the underlying GPFS file system.

Page 8: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 8

Latest functionalities stress test

Early testbed not available any more due to storage needs of T1 operations

Current (small) testbed: 2 StoRM instances Server1: Dual-PIII 1GHz, 512MB, Fast Ethernet (100 Mb/s)

Server2: Dual-Xeon 2.4 GHz, 2GB RAM, Fast Ethernet (100 Mb/s)

Each machine mounts a small GPFS volume and runs: StoRM server(s). MySQL, GridFTP and other services.

Description: stress tests for each functionality provided. Large number of requests (1000-10000) performed with different submission rate

(10 - 30 requests per second).

For each test was evaluated: Execution rate. Number of failures.

NB: Performances hereby reported are strictly related to the underlying systems used

(CPU, disks, memory), and would scale up significantly by using more performant hardware.

Page 9: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 9

Functionalities stress test: results

Synch Functions: Mkdir:

2000 requests with submission rate of ~30 Hz. Executed at ~10 Hz (depends on underlying GPFS, and in this case it was a toy

one on local disks) without failures.

Rmdir: 2000 requests with submission rate of ~30 Hz. Executed at ~30 Hz without failures.

Rm: 1000 requests for files of 1 MB with submission rate of ~30 Hz. Executed at ~30 Hz without failures.

SrmLS: 1000 requests on single files submitted at ~60 Hz , executed at ~20 Hz. 1 single LS on directory with 1000 file, 6 s. 3 simultaneous LS on directory with 1000 file, 10 s. 12 simultaneous LS on directory with 1000 file, 30 s. … No failures.

Page 10: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 10

Functionalities stress test: results Asynch Functions:

PrepareToPut 10000 request submitted at 10 Hz rate (10 request for seconds) Executed at 10Hz rate by StoRM.

PrepareToGet 1000 request with submission rate 30 Hz, executed at 20 Hz.

SrmCopy 10000 request of 10 MB files submitted at rate of 10 Hz, 50 gridftp

transfers at once (transfer pool threads in StoRM was set to 50, but can be tuned by StoRM option files). Poor data throughput (just Fast Ethernet connectivity) but perfect enqueuing of the copies, all executed slowly at 1 Hz according to the available bandwidth (100 Mb/s).

Page 11: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 11

Test with official srm-client v2.1.x For interoperability issues we performed tests with srm clients

ditributed by the srm working group.

Results: Clients are able to communicate with our server.

Most functionalities are executed with success.(e.g. SrmRm, SrmMkdir, SrmReserveSpace, SrmPrepareToPut, SrmPrepareToGet, SrmGetRequestSummary, SrmCopy).

We have some problems with interpretation of some specific parameter (e.g. Date time in SrmLS , recursive flag in SrmRmdir, …).

We believe that with some work and interaction with client developers StoRM will gain full interoperability.

Page 12: – n° 1 StoRM latest performance test results Alberto Forti Otranto, Jun 8 2006

– n° 12

StoRM : Conclusions

The results obtained, in terms of rates and efficiencies, are very interesting, even if using cheap and old hardware, as we did for our latest tests.

The system can be scaled up to N StoRM servers working in parallel on the same (or different) GPFS volume(s), with a centralized persistent database (mysql at the moment, but the support for other vendors, e.g. Oracle, can be easily put in place in future releases) similarly to the Castor-SRM or dCache.

After a fruitful debugging phase, that will be now extended over the next 2-3 weeks, StoRM will be ready to be deployed to production environments. The production release candidate will be ready by the end of June.

The StoRM Project is a collaboration between INFN/CNAF (3 developers) and EGRID/ICTP (3 developers).