1 CHEP 2003 Arie Shoshani Experience with Deploying Experience with Deploying Storage Resource Managers to Storage Resource Managers to Achieve Achieve Robust File replication Robust File replication Arie Shoshani Arie Shoshani Alex Sim Alex Sim Junmin Gu Junmin Gu Scientific Data Management Group Scientific Data Management Group Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory http://sdm.lbl.gov/srm http://sdm.lbl.gov/srm

1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

Embed Size (px)

Citation preview

Page 1: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

1 CHEP 2003 Arie Shoshani

Experience with Deploying Experience with Deploying Storage Resource Managers to Achieve Storage Resource Managers to Achieve

Robust File replication Robust File replication

Arie ShoshaniArie Shoshani

Alex SimAlex Sim

Junmin GuJunmin Gu

Scientific Data Management GroupScientific Data Management Group

Lawrence Berkeley National LaboratoryLawrence Berkeley National Laboratory


Page 2: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

2 CHEP 2003 Arie Shoshani


• File replication problem - motivationFile replication problem - motivation

• What are Storage Resource ManagersWhat are Storage Resource Managers

• General Analysis Scenario and the use of SRMsGeneral Analysis Scenario and the use of SRMs

• SRM functionalitySRM functionality

• SRMs use for file replication – robustnessSRMs use for file replication – robustness

• Advantages of using SRMs for file replicationAdvantages of using SRMs for file replication

• File monitoring toolFile monitoring tool

• Analysis of file replicationAnalysis of file replication

Page 3: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

3 CHEP 2003 Arie Shoshani


• Multi-File Replication – why is it a problem?Multi-File Replication – why is it a problem?

• Tedious task – many files, repetitious

• Lengthy task – long transfer time, can take days

• Error prone – need to monitor scripts

• Error recovery – need to restart file transfers

• Stage and archive from MSS – limited concurrency, down

time, transient failures

• Use of FTP – large windows, concurrent transfer

• Security – both for local MSS and the network

• Firewalls – transfer from/to MSS must be internal to the site

Page 4: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

4 CHEP 2003 Arie Shoshani

What are What are Storage Resource Managers?Storage Resource Managers?

• Grid architecture needs to include reservation & Grid architecture needs to include reservation & scheduling of:scheduling of:• Compute resources• Storage resources• Network resources

• Storage Resource Managers (SRMs) role in the Storage Resource Managers (SRMs) role in the data grid architecturedata grid architecture• Shared storage resource allocation & scheduling• Especially important for data intensive applications• Often files are archived on a mass storage system (MSS)• Wide area networks – minimize transfers • large scientific collaborations (100’s of nodes,

1000’s of clients) – opportunities for file sharing• File replication and caching may be used• Need to support non-blocking (asynchronous) requests

Page 5: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

5 CHEP 2003 Arie Shoshani

General Analysis ScenarioGeneral Analysis Scenario



Storage Resource Manager






clientclient ...



A set oflogical files

Execution plan and site-specific


Client’s site






Compute Resource Manager

Storage Resource Manager



Requests fordata placement andremote computation

Site 2Site 1 Site N

Storage Resource Manager

Storage Resource Manager

Compute Resource Manager

result files


Page 6: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

6 CHEP 2003 Arie Shoshani

SRM is a ServiceSRM is a Service

• SRM functionalitySRM functionality• Manage space

• Negotiate and assign space to users• Manage “lifetime” of spaces

• Manage files on behalf of a user• Pin files in storage till they are released• Manage “lifetime” of files• Manage action when pins expire (depends on file types)

• Manage file sharing• Policies on what should reside on a storage resource at any one time• Policies on what to evict when space is needed

• Get files from remote locations when necessary• Purpose: to simplify client’s task

• Manage multi-file requests• A brokering function: queue file requests, pre-stage when possible

• Provide grid access to/from mass storage systems• HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE (Jlab), Castor


Page 7: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

7 CHEP 2003 Arie Shoshani

Types of SRMsTypes of SRMs

• Types of storage resource managersTypes of storage resource managers• Disk Resource Manager (DRM)

• Manages one or more disk resources• Tape Resource Manager (TRM)

• Manages access to a tertiary storage system (e.g. HPSS)• Hierarchical Resource Manager (HRM=TRM + DRM)

• An SRM that stages files from tertiary storage into its disk cache

• SRMs and File transfersSRMs and File transfers• SRMs DO NOT perform file transfer• SRMs DO invoke file transfer service if needed

(GridFTP, FTP, HTTP, …)• SRMs DO monitor transfers and recover from failures

• TRM: from/to MSS• DRM: from/to network

Page 8: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

8 CHEP 2003 Arie Shoshani

Uniformity of Interface Uniformity of Interface Compatibility of SRMsCompatibility of SRMs


Enstore JASMine


Grid Middleware







Page 9: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

9 CHEP 2003 Arie Shoshani

SRMs use in STAR forSRMs use in STAR forRobust Muti-file replication Robust Muti-file replication





HRM-COPY(thousands of files)

SRM-GET (one file at a time)

HRM-ClientCommand-line Interface

HRM(performs writes)

HRM(performs reads)

LBNLGridFTP GET (pull mode)

stage filesarchive files

Network transfer

Get listof files

Recovers from staging failures

Recovers from file transfer failures

Recovers from archiving failures

Page 10: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

10 CHEP 2003 Arie Shoshani

Detailed sequence of actionsDetailed sequence of actionsFor each file being replicatedFor each file being replicated

srmGet (sourceURL)2

GridFTP GET (pull mode)6

File staged (BNL’s diskURL)5

Anywhere srmCopy {(sourceURL=hpss.bnl.gov/xyz/file_x, targetURL =hpss.lbnl.gov/uvw/file_y)}

Get listof files fromdirectory

Request files



HRM-ClientCommand-line Interface

LBNL HRM(performs writes)

BNLHRM(performs reads)


Space 3Allocate

Space 4


Transfer Complete7



Call_back: file on disk

Call_back: file on tape




11 ReleaseSpace



Page 11: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

11 CHEP 2003 Arie Shoshani

Web-Based File Monitoring ToolWeb-Based File Monitoring Tool

Shows:-Files already transferred- Files during transfer- Files to be transferred

Also shows foreach file:-Source URL-Target URL-Transfer rate

Page 12: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

12 CHEP 2003 Arie Shoshani

Tracking multi-file replication Tracking multi-file replication performanceperformance

20020103123100 20020103123200 20020103123300 20020103123400 20020103123500 20020103123600 20020103123700 20020103123800











Staging_started_at BNL


File replication request start

Helped discover hard-to-find bug

Page 13: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

13 CHEP 2003 Arie Shoshani

File tracking helps to identify File tracking helps to identify bottlenecksbottlenecks

Shows that archiving is the bottleneck

Page 14: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

14 CHEP 2003 Arie Shoshani

File tracking shows recovery from File tracking shows recovery from transient failurestransient failures

Total:45 GBs

Page 15: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

15 CHEP 2003 Arie Shoshani

File tracking shows network File tracking shows network slowdown and recoveryslowdown and recovery

Total:53 GBs

Page 16: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

16 CHEP 2003 Arie Shoshani

Conclusion: Key advantagesConclusion: Key advantagesof using SRMs for file replicationof using SRMs for file replication

• All HRM communications are part of HRM functionalityAll HRM communications are part of HRM functionality• No changes required to HRMs

• Can replicate files from multiple sitesCan replicate files from multiple sites• In a single command to one target

• Recovers from transient failuresRecovers from transient failures• For staging and archiving from MSS• For network

• Uses disk buffers to keep multiple filesUses disk buffers to keep multiple files• pre-stage in case of slow network• Hold files in case of slow archiving

• Concurrent transfersConcurrent transfers• Concurrent staging, concurrent archiving from/to MSS• Concurrent transfers over the network• Concurrency limited by parameter setup

• Automatic cleanup of buffers (garbage collection)Automatic cleanup of buffers (garbage collection)• Can replicate files between different MSSs Can replicate files between different MSSs

(Enstore, Jasmine, HPSS, Castor, …)(Enstore, Jasmine, HPSS, Castor, …)• On-line monitoring, summary generatedOn-line monitoring, summary generated

Page 17: 1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific

17 CHEP 2003 Arie Shoshani

BNL–LBNL file replication for STAR BNL–LBNL file replication for STAR

is is in production for 9 monthsin production for 9 months now now

(nearly daily use to replicate 1000s of files per day)(nearly daily use to replicate 1000s of files per day)

More on SRMsMore on SRMs

Thursday, at 1:30 pmThursday, at 1:30 pm

(Category 3)(Category 3)

Final note
