1
Ismail Akturk, Mehmet Balman, Xinqi Wang and Tevfik Kosar Center for Computation and Technology at Louisiana State University, Baton Rouge, LA, 70803 iCOMMANDS FUSE local $ petafs -m lsu local $ ls ~/petashare lsu local $ cd ~/petashare/lsu/tempZone/home/team1 local $ cp /tmp/srcFile ./shareFile local $ ls shareFile local $ petafs -u lsu LONI asynchronous replication module Disk Storage metadata iRODS SERVER iCAT SERVER PARROT local $ petashell pshell ~$ cd /petashare/uno/tempZone/home/team1 pshell ~$ ls shareFile pshell ~$ vi shareFile "Hello PetaShare" pshell ~$ exit local $ PETASHELL PETAFS local $ ppwd /tempZone/home/team1 local $ pls C- /tempZone/home/team1 shareFile local $ pget shareFile ~/localFile local $ cat ~/localFile "Hello PetaShare" local $ Distributed Storage and Data Management in PetaShare for Collaborative Research PetaShare supports native iRODS metadata system for speedy access to data archive and semantic-enabled cross-domain metadata for intergrated view over archives spanning multiple disciplines. PetaShare provides very light weight client tools based on FUSE, Parrot and icommands technologies which enable easy, transparent, and scalable access at the user level to the data stored in distributed resources. These are: Petafs (Virtual File System) Petashell (Shell Interface) Pcommands(Customized Commands) PetaShare leverages 40 Gigabit per second Louisiana Optical Network Initiative (LONI) infrastructure to make the interconnections, fully exploiting high bandwidth low latency optical network technologies. PetaShare is based on evolved version of iRODS that provides a globally unified name space across geographically distributed storage resources, as well as metadata management interface. Initial implementation and deployment of PetaShare involves six institutions in Louisiana. PetaShare manages 250 Terabytes of disk storage distributed across these institutions as well as 400 Terabytes of tape storage. PetaShare treats storage resources and the tasks related to data access as first class entities just like the computational resources and compute tasks, and not simply the side effect of computation. Along with data storage resources, key technologies that are being developed in PetaShare project include: Data-aware Storage Systems, Data-aware Schedulers, Cross-domain Metadata Scheme, Advanced Buffering and Data Aggregation, Asynchronous Replication for Metadata Servers The NSF funded PetaShare project aims to enable transparent handling of underlying data sharing, archival, and retrieval mechanisms, and make data available to scientists for analysis and visualization on demand in different applications, such as: • Coastal & Environmental Modeling, • Geospatial Analysis, • Bioinformatics, • Medical Imaging, • Fluid Dynamics, • Petroleum Engineering, • Numerical Relativity, • High Energy Physics. Tape asynchronous replication module metadata iRODS SERVER iCAT SERVER Long Term Data Archival PCOMMANDS This project is in part sponsored by National Science Foundation, Department of Energy, and Louisiana Board of Regents. For further information, please visit the webpages at: http://www.petashare.org http://www.loni.org http://fuse.sf.net http://www.cctools.org http://www.irods.org ACKNOWLEDMENTS Semantic Metadata Store Commandline Interface Metadata Query Browser Protege Query Parser Web Server Insertion Query Data Migration, Replication, Load Balancing

Distributed Storage and Data Management in PetaShare for ...csc.lsu.edu/~balman/pdfs/prst/OSG09_PetaShare_poster.pdfIsmail Akturk, Mehmet Balman, Xinqi Wang and Tevfik Kosar Center

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Storage and Data Management in PetaShare for ...csc.lsu.edu/~balman/pdfs/prst/OSG09_PetaShare_poster.pdfIsmail Akturk, Mehmet Balman, Xinqi Wang and Tevfik Kosar Center

Ismail Akturk, Mehmet Balman, Xinqi Wang and Tevfik KosarCenter for Computation and Technology at Louisiana State University, Baton Rouge, LA, 70803

iCOMMANDSFUSE

local $ petafs -m lsulocal $ ls ~/petasharelsulocal $ cd ~/petashare/lsu/tempZone/home/team1local $ cp /tmp/srcFile ./shareFilelocal $ lsshareFilelocal $ petafs -u lsu

LONI

asynchronousreplication

module

Disk Storage

metadata

iRODSSERVER

iCATSERVER

PARROT

local $ petashellpshell ~$ cd /petashare/uno/tempZone/home/team1pshell ~$ lsshareFilepshell ~$ vi shareFile"Hello PetaShare"pshell ~$ exitlocal $

PETASHELL

PETAFSlocal $ ppwd/tempZone/home/team1local $ plsC- /tempZone/home/team1 shareFilelocal $ pget shareFile ~/localFilelocal $ cat ~/localFile"Hello PetaShare"local $

Distributed Storage and Data Management in PetaShare for Collaborative Research

PetaShare supports native iRODS metadata system for speedy access to data archive and semantic-enabled cross-domain metadata for intergrated view over archives spanning multiple disciplines.

PetaShare provides very light weight client tools based on FUSE, Parrot and icommands technologies which enable easy, transparent, and scalable access at the user level to the data stored in distributed resources. These are:

▪ Petafs (Virtual File System)▪ Petashell (Shell Interface)▪ Pcommands(Customized Commands)

PetaShare leverages 40 Gigabit per second Louisiana Optical Network Initiative (LONI) infrastructure to make the interconnections, fully exploiting high bandwidth low latency optical network technologies.

PetaShare is based on evolved version of iRODS that provides a globally u n i f i e d n a m e s p a c e a c r o s s geographically distributed storage resources, as wel l as metadata management interface.

Initial implementation and deployment of PetaShare involves six institutions in Louisiana.

PetaShare manages 250 Terabytes of disk storage distributed across these institutions as well as 400 Terabytes of tape storage.

PetaShare treats storage resources and the tasks related to data access as first class entities just like the computational resources and compute tasks, and not simply the side effect of computation.

Along with data storage resources, key technologies that are being developed in PetaShare project include:▪ Data-aware Storage Systems, ▪ Data-aware Schedulers, ▪ Cross-domain Metadata Scheme,▪ Advanced Buffering and

Data Aggregation, ▪ Asynchronous Replication for

Metadata Servers

The NSF funded PetaShare project aims to enable transparent handling of underlying data sharing, archival,and retrieval mechanisms, and make data available to scientists for analysis and visualization on demand in different applications, such as:

• Coastal & Environmental Modeling, • Geospatial Analysis,• Bioinformatics, • Medical Imaging, • Fluid Dynamics, • Petroleum Engineering,• Numerical Relativity,• High Energy Physics.

Tape

asynchronousreplication

module

metadata

iRODSSERVER

iCATSERVER

Long Term Data

Archival

PCOMMANDS

This project is in part sponsored by National Science Foundation, Department of Energy, and Louisiana Board of Regents.

For further information, please visit the webpages at:http://www.petashare.orghttp://www.loni.orghttp://fuse.sf.nethttp://www.cctools.orghttp://www.irods.org

ACKNOWLEDMENTS

Semantic Metadata Store

Commandline Interface Metadata Query

Browser

ProtegeQuery Parser

Web Server

Insertion Query

Data Migration, Replication,

Load Balancing