33
Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002 http://www. gridpp .ac. uk / datamanagem http://cern.ch/grid-data-managem

Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

Embed Size (px)

Citation preview

Page 1: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

Data ManagementGridPP and EDG

Gavin McCanceUniversity of Glasgow

May 9, 2002

http://www.gridpp.ac.uk/datamanagementhttp://cern.ch/grid-data-management

Page 2: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 2/32

Overview

Status of data management workProducts delivered to 1.2GDMP 3.0Reptor: replica managerSpitfireOptor: grid simulation

What’s currently available and future plans

Page 3: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 3/32

WP2: Data Management

ReplicationReplica catalogueReplica manager

Query Optimisation*Grid replica optimisation

Meta-data management*Secure, transparent access to meta-data

Service discovery

*Direct UK involvement

Work is done within the EDG WP2 team (based in CERN)

Page 4: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 4/32

General Status

Deliverables on targetMajor software released for 1.2

UK manpower based at Glasgow:2.5 RAs, Me, Will Bell, Paul Millar (50%)1 PhD student, David Cameron1 more student to come in September

Page 5: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 5/32

File Replication

Requires: replica catalogue or replica location serviceKeeps track of the mapping between

logical file name and physical file names

Requires: replica manager or replica management serviceHigh level tool to actually do the replication

and manage what files are being replicated

File-1

File-1

File-1 File-1

Paris

Glasgow Chicago

LFN

Page 6: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 6/32

File Replication

Current replication functionality provided by GDMP 3.0 – new for 1.2 release!

Used for mirroring of storage elements

Implements subscription based replication model with security, and updates the Globus replica catalogue

Page 7: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 7/32

GDMP 3.0

1. Site ‘B’ subscribes to site A’s files

2. ‘A’ produces new file – ‘B’ will be notified of this

3. ‘B’ then starts transfer of new files from ‘A’

4. Replica catalogue at ‘B’ is updated to reflect new file replica.

Site A Site B

Page 8: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 8/32

GDMP 3.0

Changes w.r.t. 2.* :New security model – host certificatesServer delegation, i.e. accounts on SE not

necessarily requiredClient-only install possibleBasic space managementStand-alone server option ‘unsubscribe’ option

Page 9: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 9/32

GDMP 3.0 status

Final version of GDMP released for 1.2For future, GDMP will be absorbed into the

Replica Manager Service which will offer richer functionality

SRPM, RPM, tarball, User Guide, Quick Config for EDG SEs:http://cmsdoc.cern.ch/cms/grid/

Page 10: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 10/32

Replica Location Service

Current Globus replica catalogue is LDAP basedTo be replaced with new ‘GIGGLE’

framework Replica Location ServiceJoint EDG WP2 / Globus / PPDG project

Trade-offs: global consistency, space, query / update overhead, reliability

Page 11: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 11/32

RLS model…

Reliable local state

Relaxed global consistency

Soft state updates to global index nodes permits graceful behaviour in face of network problems

Secure access

Implemented as web service

Page 12: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 12/32

LRC LRC LRC

RLI

RLIRLI

LRCLRC

StorageElement

StorageElement

StorageElement

StorageElement

StorageElement

Hierarchical indexing. The higher-level RLI contains pointers to lower-level RLIs or LRCs.

RLI = Replica Location Index

LRC = Local Replica Catalog

Page 13: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 13/32

Scalable, reliable

LFN Namespace partitioned among RLIs

Redundant RLIs for reliability

Lossy compressionHigher level RLIs may lose accuracy about

mappings

Page 14: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 14/32

RLS status

Currently Alpha for developershttp://cern.ch/grid-data-management/replica-

location-service/RLS.html

New version will be progressively integrated with other replication software.Testbed deployment in September release

Page 15: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 15/32

Replica Management Service

Web Service under development (Reptor)Will absorb GDMP functionality and extend itWill use the Replica Location Service

Two facetsCore Replica Management APIOptimisation API

Page 16: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 16/32

Core Reptor API

Similar to GDMP API registerEntrycopyFilecopyAndRegisterFile replicateFiledeleteFile listReplicas

Page 17: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 17/32

Interactions with SE

Defined file types:

Physical file attribute File type

Master permanent

secondary copy permanent, durable or volatile.

Page 18: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 18/32

RMS Current Status

Testbed can use GDMP for 1.2Defined Reptor API currently wraps the Globus Replica Manager

Will be developed progressivelyFull version on testbed in SeptemberTechnical reports: http://cern.ch/grid-data-

management/publications.html

Page 19: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 19/32

Grid Query Optimisation

Best place for a job?

Joint WP1 / WP2 question…

Approach: 2-Phase Optimisation:Phase 1: Find suitable CE for job execution

given distribution of files it will accessPhase 2: Re-optimise file access during job

execution (due to dynamic nature of Grid, the resource status changes over time)

Page 20: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 20/32

Optimisation API

initFilePrefetch(LFN[], CE, protocol[], fraction)

cancelFilePrefetch(LFN[], CE)

getBestFile(LFN[], protocol[], fraction)

getNetworkCosts(SE1, SE2, filesize, protocol) from WP7

getIOCosts(SE, PFN) from WP5

Page 21: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 21/32

Grid Replica Optimisation

Controlled intelligent replication to optimise grid over the longer term

Collect getBestFile requests

‘Intelligence’ based on algorithms

Test replication algorithms on data-centric grid simulator

Page 22: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 22/32

Optor – replica optimiser simulation

Simulate prototype Grid

Input site policies and experiment data files.

Introduce replication algorithm: Files are always replicated to the

local storage. If necessary oldest files are

deleted.

Page 23: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 23/32

Optor first resultsEven a basic replication algorithm significantly

reduces network traffic and program running times.

New economics-based algorithms under investigation!

http://ppewww.ph.gla.ac.uk/ScotGRID/Optor

Page 24: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 24/32

Meta-data Management

Spitfire v1.1.0 delivered A grid enabled database service

Grid enabled front end to any type of RDBMS

Examples: Grid meta-data: replica catalogue, service registry Application meta-data: experimental data

catalogues, calibration data

Page 25: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 25/32

V1.1.0 XSQL Spitfire

CURRENT (v1.1.0) is based on XSQL templates on the server, e.g.

<role=“Read-only”/><query> SELECT FILENAME FROM HFS_DATASET WHERE RNNO={@run} AND TRIGGER={@trig} AND STATUS={@stat}</query>

File URL = http://filecat1.atlas.cern.ch/hfs/findDataSet.xsql

Page 26: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 26/32

V1.1.0 Spitfire client

Any HTTP client – either your own app, or a web-browser form

POST an HTML FORM to http://filecat1.atlas.cern.ch/hfs/findDataSet.xsqlwith parameters run=25555, trig=highlumi, stat=good

The operation is made on the database, and the result send back to the client…

Page 27: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 27/32

Security Mechanism

Servlet Container

SSLServletSocketFactory

TrustManager

Security Servlet

Does user specify role?

Map role to connection id

Authorization Module

HTTP + SSLRequest + client certificate

Yes

Role

Trusted CAsIs certificate signed

by a trusted CA?

No

Has certificatebeen revoked?

Revoked Certsrepository

Find default

No

Role repositoryRole ok?

Connectionmappings

Translator Servlet

RDBMS

Request and connection ID

ConnectionPool

Page 28: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 28/32

V1.1.0

V1.1.0 available for 1.2 release now!

SRPM, RPM, tarball installation

User / Admin / Quick Install guideshttp://cern.ch/hep-proj-spitfire

Page 29: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 29/32

New spitfire client (dev)Users can use either this or v1.1.0 static (XSQL template based) functionalityA database client API has been definedWill implement as grid service using standard web service technologies

Page 30: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 30/32

Client side API to access remote database

DB AdminCreate(), Drop(), Alter() Table, Database

DB Core functionality Insert(), Update(), Delete(), Select()

DB Role adminSecure, role based authorisation

DB InformationSchema, Quotas, Disk space

Page 31: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 31/32

Extra functionality

To be developed..

Distributed querying

Replication of meta-data

Automated expiration and cleanup

Discussions with UK DBTF and GGF Database Group

Page 32: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 32/32

Service IndexHow do I find a specific grid service? E.g. replica location server, image database,

information service

XML Service description What, where, attributes, how to contact.

Scalable architectures for querying this developedService index web service W. Hoschek’s thesis and paper (WP2@CERN) API developed

Page 33: Data Management GridPP and EDG Gavin McCance University of Glasgow May 9, 2002

GridPP, 9 May, 2002 Gavin McCance 33/32

More Info

More information available at…

http://www.gridpp.ac.uk/datamanagement

http://cern.ch/grid-data-management