30
DataGrid is a project funded by the European Union Virtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN [email protected] EU DataGrid WP2 Manager EU DataGrid Data Management Services

DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN [email protected] [email protected]

Embed Size (px)

Citation preview

Page 1: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

DataGrid is a project funded by the European Union Virtual Observatory as a Data Grid – WP2 Data Management

Peter [email protected]

EU DataGrid WP2 Manager

EU DataGrid Data Management Services

Page 2: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 2

Talk Outline

Introdution to EU DataGrid workpackage 2

WP2 Service Design and Interactions

• Replication Services

• Spitfire

• Security

Conclusions and outlook

WP2 MembersDiana Bosio, James Casey, Akos Frohner, Leanne Guy, Peter Kunszt, Erwin Laure, Levi Lucio, Heinz Stockinger, Kurt Stockinger - CERNGiuseppe Andronico, Federico DiCarlo, Andrea Domenici, Flavia Donno, Livio Salconi – INFNWilliam Bell, David Cameron, Gavin McCance, Paul Millar, Caitriona Nicholson – PPARC, University of GlasgowJoni Hahkala, Niklas Karlsson, Ville Nenonen, Mika Silander, Marko Niinimäki – Helsinki Institute of PhysicsOlle Mulmo, Gian Luca Volpato – Swedish Research Council

Page 3: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 4

EU DataGrid Project Objectives DataGrid is a project funded by European Union whose objective is to

exploit and build the next generation computing infrastructure providing intensive computation and analysis of shared large-scale databases.

Enable data intensive sciences by providing world wide Grid test beds to large distributed scientific organisations ( “Virtual Organisations, VO”)

Start ( Kick off ) : Jan 1, 2001 End : Dec 31, 2003

Applications/End Users Communities : HEP, Earth Observation, Biology

Specific Project Objetives:

• Middleware for fabric & grid management

• Large scale testbed

• Production quality demonstrations

• To collaborate with and complement other European and US projects

• Contribute to Open Standards and international bodies

( GGF, Industry&Research forum)

Page 4: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 5

DataGrid Main Partners

CERN – International (Switzerland/France)

CNRS - France

ESA/ESRIN – International (Italy)

INFN - Italy

NIKHEF – The Netherlands

PPARC - UK

Page 5: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 6

Research and Academic Institutes•CESNET (Czech Republic)•Commissariat à l'énergie atomique (CEA) – France•Computer and Automation Research Institute,  Hungarian Academy of Sciences (MTA SZTAKI)•Consiglio Nazionale delle Ricerche (Italy)•Helsinki Institute of Physics – Finland•Institut de Fisica d'Altes Energies (IFAE) - Spain•Istituto Trentino di Cultura (IRST) – Italy•Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany•Royal Netherlands Meteorological Institute (KNMI)•Ruprecht-Karls-Universität Heidelberg - Germany•Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands•Swedish Research Council - Sweden

Assistant Partners

Industrial Partners•Datamat (Italy)•IBM-UK (UK)•CS-SI (France)

Page 6: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 7

Project Schedule

Project started on 1/Jan/2001

TestBed 0 (early 2001)

International test bed 0 infrastructure deployedGlobus 1 only - no EDG middleware

Successful Project Review by EU: March 2002

TestBed 1 ( 2002 )

Successful 2nd Project Review by EU: February 2003

TestBed 2 (Now)

Some complete re-writes of components. Builds on TestBed 1 experience.

TestBed 3 (Oktober 2003)

Project stops on 31/Dec/2003, maybe a couple of months extension to wrap up and document results (no additional funding)

Page 7: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 8

EDG Highlights

All EU deliverables (40, >2000 pages) submitted

• in time for the review according to the contract technical annex

First test bed delivered with real production demos

All deliverables (code & documents) available via www.edg.org

• http://cern.ch/eu-datagrid/Deliverables/default.htm

• requirements, surveys, architecture, design, procedures, testbed analysis etc.

Project re-orientation last year in August: From R&D Testbed to ‘production grid’

Page 8: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 9

Working Areas

Applications

Middleware

Infrastructure

Man

ag

em

en

tTest

bed

The DataGrid project is divided in 12 Work Packages distributed in four Working Areas

Page 9: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 10

Work PackagesWP1: Work Load Management System

WP2: Data Management

WP3: Grid Monitoring / Grid Information Systems

WP4: Fabric Management

WP5: Storage Element

WP6: Testbed and demonstrators

WP7: Network Monitoring

WP8: High Energy Physics Applications

WP9: Earth Observation

WP10: Biology

WP11: Dissemination

WP12: Management

Page 10: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 11

Trying hard to have a real GRID..

Testbed 0 :

• Grid technology was not mature enough

• Configuration and deployment issues

• Stability problems

• Obscure errors

Project reorientation: Stability, Stability, Stability – TB 1

• TB1 revealed a set of design bugs in Globus

• GASS Cache issue – fixed by Condor (rewritten)

• MyProxy issues – could never be used

• MDS did not scale – had to set up fake local info system

Reingeneering of essential components – TB 2

• New resource broker

• R-GMA instead of MDS as info system

• Concrete Support channels (VDT)

• New configuration tool LCFG-ng (from U of Edinburgh!)

Page 11: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 12

Grid middleware architecture hourglass

Current Grid architectural functional blocks:

OS, Storage & Network services

Basic Grid Services

High Level Grid Services

HEP Application Services (LCG)Common application layer

CMS ATLAS CMS LHCbSpecific application layer

GLOBUS 2.2

EU DataGrid

middleware

Earth Observation

andBiomed

Page 12: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 13

EU DataGrid WP2Data Management Work Package

Responsible for

Transparent data location and secure access

Wide-area replication

Data access optimization

Metadata access

NOT responsible for (but it has to be done)

Data storage (WP5)

Proper Relational Database bindings (Spitfire)

Remote I/O (GFAL)

Security infrastructure (VOMS)

Page 13: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 14

WP2 Service Paradigms

Choice of technology:

• Java-based servers using Web Serviceso Tomcat, Oracle 9iAS, soon WebSphere

• Interface definitions in WSDL

• Client stubs for many languages (Java, C, C++)o Axis, gSOAP

• Persistent service data in Relational Databaseso MySQL, Oracle, soon DB2

Modularity

• Modular service design for pluggability and extensibility

• No vendor specific lock-ins

Evolvable

• Easy adaptation to evolving standards (OGSA, WSDL 1.2)

• Largely independent of underlying OS, RDBMS – works on Windows too!

Page 14: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 15

StorageElement

Replication Services: Basic Functionality

Replica ManagerReplica Location

Service

Replica Metadata Catalog

StorageElement

Files have replicas stored at many Grid sites on Storage Elements.

Each file has a unique Grid ID.Locations corresponding to the GUID are kept in the Replica Location Service.

Users may assign aliases to the GUIDs. These are kept in the Replica Metadata Catalog.

The Replica Manager provides atomicity for file operations, assuring consistency of SE and catalog contents.

Page 15: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 16

StorageElement

Higher Level Replication Services

Replica ManagerReplica Location

Service

Replica Optimization Service

Replica Metadata Catalog

SEMonitor

Network Monitor

Replica Subscription Service

StorageElement

The Replica Manager may call on the Replica Optimization service to find the best replica among many based on network and SE monitoring.

The Replica Subscription Service issues Replication commands automatically, based on a set of subscription rules defined by the user.

Hooks for user-defined pre- and post-processing for replication operations are available.

Page 16: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 17

StorageElement

Interactions with other Grid components

Replica ManagerReplica Location

Service

Replica Optimization Service

Replica Metadata Catalog

SEMonitor

Network Monitor

Information Service

Resource Broker

User Interface orWorker Node

Replica Subscription Service

StorageElement

Virtual OrganizationMembership Service

Applications and users interface to data through the Replica Manager either directly or through the Resource Broker. Management calls should never go directly to the SE.

Page 17: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 18

Replication Services Status

Current Status

• All components are deployed right now – except for the RSS

• Initial tests show that expected performance can be met

• Need proper testing in a ‘real user environment’ – EDG2; LCG1

Features for next release

• Currently Worker Nodes need outbound connectivity – Replica Manager Service needed. Needs proper security delegation mechanism.

• Logical collections support

• Service-level authorization

• GUI

Page 18: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 19

Spitfire: Grid-enabling RDBMS

Capabilities:

• Simple Grid enabled front end to any type of local or remote RDBMS through secure SOAP-RPC

• Sample generic RDBMS methods may easily be customized with little additional development, providing WSDL interfaces

• Browser integration

• GSI authentication

• Local authorization mechanism

Status: current version 2.1

• Used by EU DataGrid Earth Observation and Biomedical applications.

Next Step: OGSA-DAI interface

Page 19: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 20

Spitfire added value : Security

Grid security

• TrustManager deals with GSI proxy certificates

• Support for VOMS certificate extensions

• Secure java, c/c++, perl clients

Local Authorization

• Mapping through Gridmap file supported

• Fine grained authorization hooks : a mapping service is provided to map VOMS extensions (group, role, capability) to DB roles that depending on the DB may be row-level authorization mechanisms (GRANT/DENY).

Installation kit

• Easy installation and configuration of all security options

Page 20: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 21

Spitfire customization

Spitfire started as a ‘proof of technology’ for Web Services and java.

Customizable into specific services dealing with persistent data

• All WP2 services are in this sense ‘Spitfire’ services (see later)

Test platform for latest available codebase

• Gained experience with WSDL, JNDI, Tomcat, Axis, gSOAP

• Next things to try : JBOSS (for JMS, JMX)

Experimental add-ons

• Secure browser using JSP (proxy certificates for mozilla, netscape, ie..)

• Distributed query agent drop-in

• Todo: OGSA-DAI interface as far as possible

Page 21: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 22

RLS Architecture (evolved!)

A hierarchical RLS topology: LRCs update RLIs, RLIs may forward information

RLIRLI

LRC LRC

RLI RLI

LRC

RLIs indexing over the full namespace (all LRCs are indexed) receiving updates directly

RLI receiving updates from other RLIs

LRC sending updates to all Tier 1 RLIs

RLI

LRC

RLI

RLIRLI

Page 22: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 23

EDG Grid Catalogs (1/2)

Replica Location Service (RLS)

• Local Replica Catalog (LRC)o Stores GUID to Physical File Name (PFN) mappings o Stores attributes on PFNso Local Replica Catalogs in Grid : One per Storage Element (per VO)o Tested to 1.5M entries

• Replica Location Index (RLI)o Allow fast lookup of which sites store GUID -> PFN mappings for a given

GUIDo Replica Location Indices in the Grid :Normally one per Site (per VO),

which indexes all LRCs in the Grido Being deployed as part of EDG 2.1 in July

In the process of integration into other componentso Tested to 10M entries in an RLI

Page 23: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 24

EDG Grid Catalogs (2/2)

Replica Metadata Catalog (RMC)

• Stores Logical File Name (LFN) to GUID mappings – user-defined aliases

• Stores attributes on LFNs and GUIDs

• One logical Replica Metadata Catalog in Grid (per VO)o Single point of synchronization – current assumption in EDG modelo bottleneck ? - move to replicated distributed database

No Application Metadata Catalog provided – see Spitfire

• But Replica Metadata Catalog has support for small level of application metadata – O(10)

RMC usage not as well understood as Replica Location Service

• Architectural changes likely

• Use cases required

Page 24: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 25

Typical Location of Services in LCG-1

ReplicaLocation

Index

LocalReplicaCatalog

StorageElement

CNAF

ReplicaLocation

Index

LocalReplicaCatalog

StorageElement

RAL

ReplicaLocation

Index

LocalReplicaCatalog

StorageElement

CERN

ReplicaLocation

Index

LocalReplicaCatalog

StorageElement

IN2P3

ReplicaMetadataCatalog

StorageElement

Page 25: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 26

Catalog Implementation Details

Catalogs implemented in Java as Web Services, and hosted in a J2EE application server

• Uses Tomcat4 or Oracle 9iAS for application server

• Uses Jakarta Axis for Web Services container

• Java and C++ client APIs currently provided using Jakarta Axis (Java) and gSoap (C++)

Catalog data stored in a Relational Database

• Runs with either Oracle 9i or MySQL

Catalog APIs exposed as a Web Service using WSDL

• Easy to write a new client if we don’t support your language right now

Vendor neutral approach taken to allow different deployment options

Page 26: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 27

Quality of Service

Quality of Service depends upon both the server software and architecture used as well as the software components deployed on it

Features required for high Quality of Service

• High Availability

• Manageability

• Monitoring

• Backup and Recovery with defined Service Level Agreements

Approach

• Use vendor solutions for availability and manageability where available

• Use common IT-DB solutions for monitoring and recovery

• Components architected to allow easy deployment in high-availability environment

A variety of solutions with different characteristics are possible

Page 27: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 28

Tradeoffs in different solutions

Manageability

Ava

ilabi

lity

Single InstanceMySQL/Tomcat

ClusteredOracle 9i/Tomcat

ClusteredOracle 9i/9iAS

Single InstanceOracle 9i/9iAS

Page 28: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 30

System Architecture – High Availability

Standard n-tier architecture

• Front end application layer load-balancer

o Oracle 9iAS Web Cache

• Cluster of stateless application servers

o Oracle 9iAS J2EE container

• Clustered database nodes

o Oracle 9i/RAC

• Shared SAN storageo Fibre Channel storage

Inte

rnal L

AN

Sto

rage N

etw

ork

Exte

rnal L

AN

Page 29: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 31

Security: Infrastructure for Java-based Web Services Trust Manager

• Mutual client-server authentication using GSI (ie PKI X509 certificates) for all WP2 services

• Supports everything transported over SSL

Authorization Manager

• Supports coarse grained authorization: Mapping user->role->attribute

• Fine grained authorization through policies, role and attribute maps

• Web-based Admin interface for managing the authorization policies and tables

Status:

• Fully implemented, authentication is enabled on the service level

• Delegation implementation needs to be finished

• Authorization needs more integration, waiting for deployment of VOMS

Page 30: DataGrid is a project funded by the European UnionVirtual Observatory as a Data Grid – WP2 Data Management Peter Kunszt CERN Peter.Kunszt@cern.ch Peter.Kunszt@cern.ch

Virtual Observatory as a Data Grid– 1 July 2003 – WP2 Data Management – n° 32

Conclusions and outlook

Re-focus on production has been a good but painful choice

• from hype to understanding the implications of wanting to run a production Grid

• reengineering of several components was necessary

• however, the project was not well prepared for this change – the timelines had to be constantly revised in the last year

The second generation Data Management services have been designed and implemented based on the Web Service paradigm

Flexible, extensible service framework

Deployment choices : robust, highly available commercial products supported (eg. Oracle) as well as open-source (MySQL, Tomcat)

First experiences with these services show that their performance meets the expectations

Real-life usage will show its strengths and weaknesses on the LCG-1 and EDG2.0 testbeds during the rest of this year.

Proceed with standardization efforts: DAI, RLS

Carry over the experience into the next project : EGEE