13
© 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

© 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

Embed Size (px)

Citation preview

Page 1: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

© 2008 Open Grid Forum

File Catalog Development in Japan e-Science Project

GFS-WG, OGF24 SingaporeHideo MatsudaOsaka University

Page 2: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

2

Japan e-Science Project

• 3.5 years project, starting from September 2008

• Sponsored by MEXT (the Ministry of Education, Culture, Sports, Science and Technology), Japan

• Two major sub-projects• System Software (Leader: Yutaka Ishikawa,

Univ. Tokyo)• Grid Software (Leader: Ken-ichi Miura, NII)

Page 3: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

© 2008 Open Grid Forum

Overview of e-Science Grid Software Project 3

Data SharingNation-wide distributed FS

File catalog

ComputationWorkflow

Job SubmissioApplication mgmt

DB FederationDB access control AuthN info mgmt

Info. Tech. CenterLaboratory

Laboratory

Application I/FApp control script API

App monitoring

End users

Grid Middleware

DB

DB

DB

Grid Middleware

DB

Middleware EvaluationGrid Operation Infrastracture /

Application Evaluation

Page 4: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

4

Nation-wide Distributed File System• Goal: Development of distributed file system technology spread

over nation-wide with comparative performance of local fileserver• Research Topics:

• Optimal automatic placement of file replicas based on Gfarm 2.0.• Fault tolerance with file replicas

File Server 1

File Server 1

StorageStorage

ClientClient

File Server 2

File Server 2

StorageStorage

File Server3

File Server3

StorageStorage

            Virtual Distributed File System            Virtual Distributed File System

ClientClient

FileReplica

FileReplica

File Replica

ClientClient

File

Optimal Replica Placement

Page 5: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

5

File Catalog Service

Goal: Development of interoperable file catalog service between heterogeneouse Grid environments.• Current file catalog systems (LFC (EGEE gLite), MCAT (SRB),

etc.) does not have interoperability to each other.

• Development of standardized file catalog based on RNS (Resource Namespace Service) specification.

EGEE gLite File Server

File Catalog System

SRB or iRODS File Server

Japan e-Science Distributed File System

Client(1) Logical File Name

(3) File Access with GridFTP

(2) Physical File Location (EPR)

Page 6: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

6

File Catalog in e-Science

• File Catalog can be used for not only file-location management but also metadata in e-Science since matadata is often described with hierarchical representation in many sciences.

CMSATLAS

20071003 20080110

run1 run2

track1 track2

ProteomeGenome

Human Genome Plant

Genome

gb|AY157024

Bacterial Genome

Functional Analysis

Structure Analysis

sp|P37231 pdb|1FM6

High Energy Physics Molecular Biology

Page 7: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

7

Metadata Management using File Catalog

• Currently metadata are mainly stored in File Catalogs using their hierarchical namespace functionality.• gLite: LFC, Fireman• iRODS (SRB): ICAT• Globus: RLS• NAREGI: Gfarm

• It is not easy to exchange metadata over different Grid middlewares.

Page 8: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

8

Resource Namespace Service (1)

• RNS lets you map any resource into single, hierarchical namespace

• Resources are referred to in a form of EndpointReference (WS-Addressing)

• RNS Specification is published as GFD-R-P.101

• RNS implementation is available from U.Virginia and U.Tsukuba.

http://www.ogf.org/documents/GFD.101.pdf

Page 9: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

9

Resource Namespace Service (2)

• Hierarchical namespace management that provides name-to-resource mapping

• Basic Namespace Component• Virtual Directory

• Non-leaf node in hierarchical namespace tree

• Junction• Name-to-resource mapping that

interconnects a reference to any existing resource into hierarchical namespace

/grid

ogf jp

data gfs

file1 file3file2 file4

file1 file2

EPR1EPR2

EPR: Endpoint Reference

Page 10: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

10

Development of File Catalog System (Plan)

• RNS can interconnect a reference to any existing resource into hierarchical namespace

• Most of Grid middlewares have GridFTP for data transfer

Use RNS as a standardized File Catalog Use GridFTP URL “gsiftp://.../” as the address of Endpoint

Reference.

gLite File Server (SRM)

RNS

iRODS File Server

Japan e-Science File Server

Globus GridFTP Server

Client

(1) query

(2) EPR list (including address)

(3) Access with GridFTP protocol

RNS

Page 11: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

11

Comparison with gLite LFC

Comments from Erwin Laure (OGF22 GFS-WG)• add EPR: RNS is missing the detailed attributes of

the replicas.• query EPR: The attributes of a namespace entry

should be defined, allowing specialized queries and lookups.

• RNS lacks bulk operations, sessions, transactions. Adoption of those may improve performance.

• Access control and VO management are also not introduced yet.

Page 12: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

12

Comparison with iRODS

Comments from Reagan Moore (OGF23 GFS-WG)

• Applications now manipulate structured information. iRODS can generate and manipulate structured information with micro-services.

• Multiple standards for describing structured information.

Page 13: © 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University

13

Summary

• Standarized File Catalog is useful for federating heterogeneous Data Grids.

• Need to establish File Catalog Profile for interoperation of different File Catalogs (and for its standardization).