16
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego [email protected] http://www.npaci.edu/DICE/

Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego

  • View
    222

  • Download
    1

Embed Size (px)

Citation preview

Applying Data Grids to Support Distributed Data Management

Storage Resource Broker

Reagan W. MooreIan Fisk

Bing ZhuUniversity of California, San Diego

[email protected]://www.npaci.edu/DICE/

Data Management Systems

• Data sharing - data grids– Federation across administration domains– Latency management– Sustained data transfers

• Data publication - digital libraries– Discovery– Organization

• Data preservation - persistent archives– Technology management– Authenticity

Consistent Data Environments

• Storage Resource Broker combines the functionality of data grids, digital libraries, and persistent archives within a single data environment

• SRB provides– Metadata consistency– Latency management functions– Technology evolution management

Metadata Consistency

• Storage Resource Broker uses a logical name space to assign global identifiers to digital entities– Files, SQL command strings, database tables, URLs

• State information that characterizes the result of operations on the digital entities is mapped onto the logical name space

• Consistency of state information is managed as update constraints on the mapping– Write locks, synchronization flags, schema extension

• SRB state information is managed in the MCAT metadata catalog

SRB Latency Management

ReplicationServer-initiated I/O

StreamingParallel I/O

CachingClient-initiated I/O

Remote Proxies,Staging

Data AggregationContainers

SourceDestination

Prefetch

NetworkDestinationNetwork

SRB 2.0 - Parallel I/O

• Client-directed parallel I/O - Client/Server– Thread-safe client

– client decides the number of threads to use

– each thread is responsible for a data segment and connects to the server independently

– utilities srbpput and srbpget

• Sustains 80% to 90% of available bandwidth using 4 parallel I/O streams and a window size of 800 kBytes

SRB 2.0 - Parallel I/O (cont1)

• Server-directed parallel I/O - Client/Server – Server plans and decides number of threads to use– Separate “Control” and “data transfer” sockets– Client listens on the “control” socket and spawns

threads to handle data transfer– Always a one-hop data transfer between client and

server– Similar to HPSS

• Works seamlessly with HPSS Mover protocol• Also works for other file systems

SRB 2.0 - Parallel I/O (cont2)

• Parallel I/O - Server/Server – Copy, replicate and staging operations– Always used in third-party transfer operations

• Server/server data transfer, client not involved

– Uses up to 4 threads depending on file size– 7-10 times improvement for large files across

country– Up to 39 MB/sec across campus (PC raid disk,

gBit ethernet).

SRBserver

SRB agent

SRBserver

Federated SRB server model

MCAT

Read Application

SRB agent

1

2

3

4

6

5

Logical NameOr

Attribute Condition

1.Logical-to-Physical mapping2.Identification of Replicas3.Access & Audit Control

Peer-to-peer

Brokering

Server(s) SpawningData

Access

Parallel Data Access

R1R2

5/6

SRB 2.0 - Bulk operations

• Uploading and downloading large number of small files– Multi-threaded

• Bulk registration – 500 files in one call– Fill 8 MB buffer before sending– Use of container

• New Sbload and Sbunload utilities– Over 100 files per second registration– 3-10+ times speedup

Unix Shell

Java, NTBrowsers

OAIWSDL

GridFTP

SDSC Storage Resource Broker & Meta-data Catalog

ArchivesHPSS, ADSM,UniTree, DMF

DatabasesDB2, Oracle,

Postgres

File SystemsUnix, NT,Mac OSX

Application

HRM

AccessAPIs

Servers

Storage AbstractionCatalog Abstraction

DatabasesDB2, Oracle, Sybase,

SQLServer

C, C++, Libraries

Logical Name Space

LatencyManagement

DataTransport

MetadataTransport

Consistency Management / Authorization-AuthenticationPrimeServer

Linux I/O

DLL /Python

Technology Management

SRB Archival Tape Library System

• SRB archival storage system in addition to HPSS, UniTree, ADSM.– A distributed pool of disk caches for front end – A tape library system back end

• STK silo for tape storage and tape mount

• 3590 tape drives

• I/O always performed on disk cache– Always stage data to cache

CMS Experiment

• Ian Fisk - user level application– Installed SRB servers at CERN, Fermi Lab, UCSD under a

user account

• Remotely invoked data replication– From UCSD, invoked data replication from CERN to Fermi

Lab, and to UCSD

– Data transfers automatically used four parallel I/O streams, default window size of 800 kBytes

• Observed– Sustained data transfer at 80% to 90% of available bandwidth

– Transferred over 1 TB of data per day using multiple sessions

Future plans• SRB 2.1 - Grid-oriented features, SRB-G (5/31/03)

– Add GridFTP driver – Access data through GridFTP server

– Upgrade to GSI 2.2 (GSI 1.1 in current version)

– Provide encrypted data transfer facility, using GSI encryption, between servers and between server and client.

• Explore network encryption as a digital entity property

– WSDL Services interface for SRB including data movement, replication, access control, metadata ingestion and retrieval and container support.

• SRB 2.2 – Federated MCATs (8/30/03)– Peer-to-peer MCATs

– Mount point like interface - /sdsc/…, /caltech/…

Next CMS Experiments

• Sustained transfer– Use 4 MB window size

• Bulk data registration– In tests with DOE ASCI project, sustained

registration of 400 files per second

• Peer-to-peer federation– Prototype of ability to initiate data and

metadata exchanges between MCAT catalogs

For More Information

Reagan W. MooreSan Diego Supercomputer Center

[email protected]

http://www.npaci.edu/DICE

http://www.npaci.edu/DICE/SRB/index.html

http://www.npaci.edu/dice/srb/mySRB/mySRB.html