Upload
valentine-moody
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
© 2008 Open Grid Forum
File Catalog Development in Japan e-Science Project
GFS-WG, OGF24 SingaporeHideo MatsudaOsaka University
2
Japan e-Science Project
• 3.5 years project, starting from September 2008
• Sponsored by MEXT (the Ministry of Education, Culture, Sports, Science and Technology), Japan
• Two major sub-projects• System Software (Leader: Yutaka Ishikawa,
Univ. Tokyo)• Grid Software (Leader: Ken-ichi Miura, NII)
© 2008 Open Grid Forum
Overview of e-Science Grid Software Project 3
Data SharingNation-wide distributed FS
File catalog
ComputationWorkflow
Job SubmissioApplication mgmt
DB FederationDB access control AuthN info mgmt
Info. Tech. CenterLaboratory
Laboratory
Application I/FApp control script API
App monitoring
End users
Grid Middleware
DB
DB
DB
Grid Middleware
DB
Middleware EvaluationGrid Operation Infrastracture /
Application Evaluation
4
Nation-wide Distributed File System• Goal: Development of distributed file system technology spread
over nation-wide with comparative performance of local fileserver• Research Topics:
• Optimal automatic placement of file replicas based on Gfarm 2.0.• Fault tolerance with file replicas
File Server 1
File Server 1
StorageStorage
ClientClient
File Server 2
File Server 2
StorageStorage
File Server3
File Server3
StorageStorage
Virtual Distributed File System Virtual Distributed File System
ClientClient
FileReplica
FileReplica
File Replica
ClientClient
File
Optimal Replica Placement
5
File Catalog Service
Goal: Development of interoperable file catalog service between heterogeneouse Grid environments.• Current file catalog systems (LFC (EGEE gLite), MCAT (SRB),
etc.) does not have interoperability to each other.
• Development of standardized file catalog based on RNS (Resource Namespace Service) specification.
EGEE gLite File Server
File Catalog System
SRB or iRODS File Server
Japan e-Science Distributed File System
Client(1) Logical File Name
(3) File Access with GridFTP
(2) Physical File Location (EPR)
6
File Catalog in e-Science
• File Catalog can be used for not only file-location management but also metadata in e-Science since matadata is often described with hierarchical representation in many sciences.
CMSATLAS
20071003 20080110
run1 run2
track1 track2
ProteomeGenome
Human Genome Plant
Genome
gb|AY157024
Bacterial Genome
Functional Analysis
Structure Analysis
sp|P37231 pdb|1FM6
High Energy Physics Molecular Biology
7
Metadata Management using File Catalog
• Currently metadata are mainly stored in File Catalogs using their hierarchical namespace functionality.• gLite: LFC, Fireman• iRODS (SRB): ICAT• Globus: RLS• NAREGI: Gfarm
• It is not easy to exchange metadata over different Grid middlewares.
8
Resource Namespace Service (1)
• RNS lets you map any resource into single, hierarchical namespace
• Resources are referred to in a form of EndpointReference (WS-Addressing)
• RNS Specification is published as GFD-R-P.101
• RNS implementation is available from U.Virginia and U.Tsukuba.
http://www.ogf.org/documents/GFD.101.pdf
9
Resource Namespace Service (2)
• Hierarchical namespace management that provides name-to-resource mapping
• Basic Namespace Component• Virtual Directory
• Non-leaf node in hierarchical namespace tree
• Junction• Name-to-resource mapping that
interconnects a reference to any existing resource into hierarchical namespace
/grid
ogf jp
data gfs
file1 file3file2 file4
file1 file2
EPR1EPR2
EPR: Endpoint Reference
10
Development of File Catalog System (Plan)
• RNS can interconnect a reference to any existing resource into hierarchical namespace
• Most of Grid middlewares have GridFTP for data transfer
Use RNS as a standardized File Catalog Use GridFTP URL “gsiftp://.../” as the address of Endpoint
Reference.
gLite File Server (SRM)
RNS
iRODS File Server
Japan e-Science File Server
Globus GridFTP Server
Client
(1) query
(2) EPR list (including address)
(3) Access with GridFTP protocol
RNS
11
Comparison with gLite LFC
Comments from Erwin Laure (OGF22 GFS-WG)• add EPR: RNS is missing the detailed attributes of
the replicas.• query EPR: The attributes of a namespace entry
should be defined, allowing specialized queries and lookups.
• RNS lacks bulk operations, sessions, transactions. Adoption of those may improve performance.
• Access control and VO management are also not introduced yet.
12
Comparison with iRODS
Comments from Reagan Moore (OGF23 GFS-WG)
• Applications now manipulate structured information. iRODS can generate and manipulate structured information with micro-services.
• Multiple standards for describing structured information.
13
Summary
• Standarized File Catalog is useful for federating heterogeneous Data Grids.
• Need to establish File Catalog Profile for interoperation of different File Catalogs (and for its standardization).