37
Globus – Part II Globus – Part II Sathish Vadhiyar Sathish Vadhiyar

Globus – Part II Sathish Vadhiyar. Globus Information Service

Embed Size (px)

Citation preview

Page 1: Globus – Part II Sathish Vadhiyar. Globus Information Service

Globus – Part IIGlobus – Part II

Sathish VadhiyarSathish Vadhiyar

Page 2: Globus – Part II Sathish Vadhiyar. Globus Information Service

Globus Information Globus Information ServiceService

Page 3: Globus – Part II Sathish Vadhiyar. Globus Information Service

MDSMDS

Meta directory service, Monitoring and discovery Meta directory service, Monitoring and discovery serviceserviceFor publishing and accessing system and For publishing and accessing system and application dataapplication dataCan restrict access to MDS information by using Can restrict access to MDS information by using GSIGSIInteracts with local information services – hour-Interacts with local information services – hour-glass mechanismglass mechanismProvides caching to minimize transfer of upto-Provides caching to minimize transfer of upto-date information and lessen network overheaddate information and lessen network overhead

Page 4: Globus – Part II Sathish Vadhiyar. Globus Information Service

MDSMDS

Integrates existing systems while providing uniform and Integrates existing systems while providing uniform and extensible data modelextensible data modelUniform APIUniform APIAdopts data representation and API, query language and Adopts data representation and API, query language and protocol from LDAP directory serviceprotocol from LDAP directory serviceUses 2 protocolsUses 2 protocols

GRIP – for providing information about entitiesGRIP – for providing information about entities GRRP – for registering entitiesGRRP – for registering entities

LDAP query language supports:LDAP query language supports: SearchSearch EnquiryEnquiry subscriptionsubscription

Page 5: Globus – Part II Sathish Vadhiyar. Globus Information Service

MDS ArchitectureMDS Architecture

GIIS – Grid Index Information Service

GRIS – Grid Resource Information Service

Page 6: Globus – Part II Sathish Vadhiyar. Globus Information Service

MDSMDS

Support for multiple information service providers - Support for multiple information service providers - information providers specified on a per attribute basisinformation providers specified on a per attribute basisMDS Data:MDS Data:

System information: architecture, OSSystem information: architecture, OS Network informationNetwork information Load statusLoad status

Additional information sent to GIIS by GRAM reporterAdditional information sent to GIIS by GRAM reporter Job statusJob status Queue informationQueue information

Information viewed through web browser or web client Information viewed through web browser or web client commandscommands

Page 7: Globus – Part II Sathish Vadhiyar. Globus Information Service

MDSMDS

Contains entries where each entry is Contains entries where each entry is associated with one or more associated with one or more attribute:value pairsattribute:value pairs

Each entry associated with a distinguished Each entry associated with a distinguished name.name.

Object class are associated with entries – Object class are associated with entries – for object typesfor object types

Page 8: Globus – Part II Sathish Vadhiyar. Globus Information Service

Distinguished name exampleDistinguished name example

Page 9: Globus – Part II Sathish Vadhiyar. Globus Information Service

Another ExampleAnother Example

Page 10: Globus – Part II Sathish Vadhiyar. Globus Information Service

Distinguished names for NetworksDistinguished names for Networks

Page 11: Globus – Part II Sathish Vadhiyar. Globus Information Service

Globus Data GridGlobus Data Grid

Page 12: Globus – Part II Sathish Vadhiyar. Globus Information Service

Data GridData Grid

Challenges:Challenges: Petabytes and terabytes of dataPetabytes and terabytes of data Query management to this huge dataQuery management to this huge data Cache managementCache management Providing gigabit/sec QoSProviding gigabit/sec QoS Coscheduling data transfers and computationCoscheduling data transfers and computation Selection of dataset replicasSelection of dataset replicas Maximize use of scarce storage, computation Maximize use of scarce storage, computation

and network resourcesand network resources

Page 13: Globus – Part II Sathish Vadhiyar. Globus Information Service

Data Grid MotivationData Grid Motivation

Application requirements:

1. A reliable secure high-performance data transfer protocol

2. Management of multiple copies of files and collections of files

Page 14: Globus – Part II Sathish Vadhiyar. Globus Information Service

Data Grid ArchitectureData Grid Architecture

Page 15: Globus – Part II Sathish Vadhiyar. Globus Information Service

GridFTPGridFTP

Secure file transfer over GridSecure file transfer over GridMultiple data channels for parallel transfers – using Multiple data channels for parallel transfers – using multiple TCP streams in parallel to improve aggregate multiple TCP streams in parallel to improve aggregate bandwidthbandwidthPartial file transfersPartial file transfersThird-party (direct server-to-server) transfers by adding Third-party (direct server-to-server) transfers by adding GSSAPI security to the existing third-party data transfers GSSAPI security to the existing third-party data transfers in FTP standard – transfers between 2 servers mediated in FTP standard – transfers between 2 servers mediated by a third-party clientby a third-party clientGSSAPI operations authenticate the third party to the GSSAPI operations authenticate the third party to the source and destination machines of data transfersource and destination machines of data transfer

Page 16: Globus – Part II Sathish Vadhiyar. Globus Information Service

Grid FTP contd…Grid FTP contd…

Authenticated data channels - both GSI and Authenticated data channels - both GSI and Kerberos securityKerberos securityReusable data channels Reusable data channels Striped data transfersStriped data transfers2 libraries:2 libraries: globus_ftp_control_library – implements control globus_ftp_control_library – implements control

channel APIchannel API gobus_ftp_client_librray – implement GridFTP APIgobus_ftp_client_librray – implement GridFTP API

Plugin mechanisms for fault tolerance, Plugin mechanisms for fault tolerance, performance monitoring, and extended data performance monitoring, and extended data processingprocessing

Page 17: Globus – Part II Sathish Vadhiyar. Globus Information Service

Globus Replica Management Globus Replica Management ArchitectureArchitecture

Replica managementReplica management For better performance or availability to accessesFor better performance or availability to accesses Mainly for access to “published” resources – read-only modelMainly for access to “published” resources – read-only model

Functions:Functions:

Architecture:Architecture: Lower level replica catalog APILower level replica catalog API Higher level replica management APIHigher level replica management API

Page 18: Globus – Part II Sathish Vadhiyar. Globus Information Service

Replica catalogReplica catalog

Provides mapping between logical names of Provides mapping between logical names of files/locations and physical objects on storage systemsfiles/locations and physical objects on storage systemsStores 3 kinds of entriesStores 3 kinds of entries

Logical collection – user defined collections of files – file Logical collection – user defined collections of files – file aggregationaggregation

Location entries – physical locations of filesLocation entries – physical locations of files Logical files – globally unique namesLogical files – globally unique names

Replica catalog API provides operations on the replica Replica catalog API provides operations on the replica catalogcatalogReplica management API provides session Replica management API provides session management, catalog creation, file maintenance, access management, catalog creation, file maintenance, access controlcontrolImplemented with LDAPImplemented with LDAP

Page 19: Globus – Part II Sathish Vadhiyar. Globus Information Service

Replica managementReplica management

Globus Replica Management integrates the Globus Globus Replica Management integrates the Globus Replica Catalog (for keeping track of replicated files) and Replica Catalog (for keeping track of replicated files) and GridFTP (for moving data) and provides replica GridFTP (for moving data) and provides replica management capabilities for data grids.management capabilities for data grids.The globus_replica_management library provides client The globus_replica_management library provides client functions that allow files to be registered with the replica functions that allow files to be registered with the replica management service, published to replica locations, and management service, published to replica locations, and moved among multiple locations. moved among multiple locations. Managing the copying and placement of files in a Managing the copying and placement of files in a distributed computing system so as to improve the distributed computing system so as to improve the performance of data analysisperformance of data analysis

Page 20: Globus – Part II Sathish Vadhiyar. Globus Information Service

Replica management service - Replica management service - functionsfunctions

Registration of files with the replica Registration of files with the replica management servicemanagement service

Creation and deletion of replicas of Creation and deletion of replicas of previously registered filespreviously registered files

Enquiries concerning the location and Enquiries concerning the location and performance characteristics of replicas.performance characteristics of replicas.

Replica selection based on performance Replica selection based on performance characteristicscharacteristics

Page 21: Globus – Part II Sathish Vadhiyar. Globus Information Service

Replica managementReplica management

Replica management API – combines Replica management API – combines storage system operations with calls to storage system operations with calls to low-level catalog API functionslow-level catalog API functions

Replica management system controls Replica management system controls where and when copies are created and where and when copies are created and provides information about copiesprovides information about copies

But does not ensure file consistencyBut does not ensure file consistency

Page 22: Globus – Part II Sathish Vadhiyar. Globus Information Service

RM APIRM API

Session managementSession management Session handles and attributesSession handles and attributes RestartRestart RollbackRollback

Catalog creation and file managementCatalog creation and file management Creating catalog entriesCreating catalog entries registering filesregistering files Publishing filesPublishing files Copying, deleting filesCopying, deleting files

Future ideasFuture ideas Incorporating advance researvationIncorporating advance researvation Automatic replica selection and creationAutomatic replica selection and creation

Data grid projectsData grid projects http://www.globus.org/datagrid/projects.htmlhttp://www.globus.org/datagrid/projects.html

Page 23: Globus – Part II Sathish Vadhiyar. Globus Information Service

Replica Catalog IllustrationReplica Catalog Illustration

Page 24: Globus – Part II Sathish Vadhiyar. Globus Information Service

Replica Selection in Globus Data Replica Selection in Globus Data Grid (Vazhkudai et al.)Grid (Vazhkudai et al.)

Replica selection uses MDS for information regarding Replica selection uses MDS for information regarding characteristics of storage systemscharacteristics of storage systemsLDAP information organized as DIT (Directory Information Tree)LDAP information organized as DIT (Directory Information Tree)Each storage resource in Data Grid incorporates GRISEach storage resource in Data Grid incorporates GRISLDAP can execute shell scripts in the background to obtain various LDAP can execute shell scripts in the background to obtain various dynamic entities like availableSpace, mountPoint etc.dynamic entities like availableSpace, mountPoint etc.Static attributes like seek times can be entered by the system Static attributes like seek times can be entered by the system administratoradministratorAttributes like data transfer rates across networks to clients can be Attributes like data transfer rates across networks to clients can be obtained based on past performance, i.e., historical dataobtained based on past performance, i.e., historical dataClassAds can also be used for expressing storage attributesClassAds can also be used for expressing storage attributes

Page 25: Globus – Part II Sathish Vadhiyar. Globus Information Service

Directory for Storage GRISDirectory for Storage GRIS

Page 26: Globus – Part II Sathish Vadhiyar. Globus Information Service

Metadata SpecificationMetadata Specification

Page 27: Globus – Part II Sathish Vadhiyar. Globus Information Service

Performance Data SpecificationPerformance Data Specification

Page 28: Globus – Part II Sathish Vadhiyar. Globus Information Service

Steps in Replica ManagementSteps in Replica Management

1.1. Application queries metadata expressing Application queries metadata expressing desired characteristics of logical filesdesired characteristics of logical files

2.2. A logical file is returnedA logical file is returned

3.3. Application queries replica catalog for Application queries replica catalog for replica instances for the logical filereplica instances for the logical file

4.4. Storage broker helps to choose a Storage broker helps to choose a particular replicaparticular replica

Page 29: Globus – Part II Sathish Vadhiyar. Globus Information Service

Replica SelectionReplica Selection

Page 30: Globus – Part II Sathish Vadhiyar. Globus Information Service

Storage Architecture stepsStorage Architecture steps

1.1. Application presents classAds regarding Application presents classAds regarding replica requirements to SBreplica requirements to SB

2.2. SB does search:SB does search:1.1. Queries replica catalogs with the list of all replicasQueries replica catalogs with the list of all replicas2.2. Queries individual GRIS of replicas about their Queries individual GRIS of replicas about their

characteristicscharacteristics3.3. Collects all information and proceeds to matchingCollects all information and proceeds to matching

3.3. Match:Match:1.1. Converts replica capabilities to replica classAdsConverts replica capabilities to replica classAds2.2. Matches application classAds to replica classAdsMatches application classAds to replica classAds

4.4. Accesses file using GridFTPAccesses file using GridFTP

Page 31: Globus – Part II Sathish Vadhiyar. Globus Information Service

Globus References / sources / Globus References / sources / creditscredits

Grid Information Services for Distributed Resource SharingGrid Information Services for Distributed Resource Sharing. K. . K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. Proceedings of Proceedings of the Tenth IEEE International Symposium on High-Performance the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10)Distributed Computing (HPDC-10), IEEE Press, August 2001., IEEE Press, August 2001.Usage of LDAP in GlobusUsage of LDAP in Globus. I. Foster, G. von Laszewski.. I. Foster, G. von Laszewski.This short note describes the use of LDAP in the Globus toolkit. It This short note describes the use of LDAP in the Globus toolkit. It answers three questions: What is LDAP? Where is it used? and answers three questions: What is LDAP? Where is it used? and Why is it used in Globus?Why is it used in Globus?A Directory Service for Configuring High-Performance A Directory Service for Configuring High-Performance Distributed ComputationsDistributed Computations. S. Fitzgerald, I. Foster, C. Kesselman, . S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke. G. von Laszewski, W. Smith, S. Tuecke. Proc. 6th IEEE Symposium Proc. 6th IEEE Symposium on High-Performance Distributed Computingon High-Performance Distributed Computing, pp. 365-375, 1997., pp. 365-375, 1997.Describes the Metacomputing Directory Service used to maintain Describes the Metacomputing Directory Service used to maintain information about Globus components. information about Globus components.

Page 32: Globus – Part II Sathish Vadhiyar. Globus Information Service

Globus References / sources / Globus References / sources / creditscredits

The Data Grid: Towards an Architecture for the Distributed The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific DatasetsManagement and Analysis of Large Scientific Datasets.  A. Chervenak, .  A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke. I. Foster, C. Kesselman, C. Salisbury, S. Tuecke. Journal of Network and Journal of Network and Computer ApplicationsComputer Applications, 23:187-200, 2001 (based on conference publication , 23:187-200, 2001 (based on conference publication from Proceedings of NetStore Conference 1999).from Proceedings of NetStore Conference 1999).Secure, Efficient Data Transport and Replica Management for High-Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive ComputingPerformance Data-Intensive Computing. B. Allcock, J. Bester, J. . B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke. D. Quesnel, S. Tuecke. IEEE Mass Storage ConferenceIEEE Mass Storage Conference, 2001., 2001.Presents the design and performance characteristics of two fundamental Presents the design and performance characteristics of two fundamental technologies for data management.technologies for data management.Replica Selection in the Globus Data GridReplica Selection in the Globus Data Grid. S. Vazhkudai, S. Tuecke, I. . S. Vazhkudai, S. Tuecke, I. Foster. Foster. Proceedings of the First IEEE/ACM International Conference on Proceedings of the First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001)Cluster Computing and the Grid (CCGRID 2001), pp. 106-113, IEEE , pp. 106-113, IEEE Computer Society Press, May 2001.Computer Society Press, May 2001.Discusses a high-level replica selection service that uses information Discusses a high-level replica selection service that uses information regarding replica location and user preferences to guide selection from regarding replica location and user preferences to guide selection from among storage replica alternatives.among storage replica alternatives.

Page 33: Globus – Part II Sathish Vadhiyar. Globus Information Service

JUNK !!JUNK !!

Page 34: Globus – Part II Sathish Vadhiyar. Globus Information Service

RFT (Reliable File Transfer)RFT (Reliable File Transfer)

Treat movement of multiple files as a single jobTreat movement of multiple files as a single job

Accept transfer requests and reliably manage Accept transfer requests and reliably manage requestsrequests

OGSI compliantOGSI compliant

To transfer data reliably between two GridFTP To transfer data reliably between two GridFTP serversservers

Uses Grid Service Handles (GSH)Uses Grid Service Handles (GSH)

Acts as a proxy for the user, acts as client on Acts as a proxy for the user, acts as client on user’s behalf for third-party transfersuser’s behalf for third-party transfers

Page 35: Globus – Part II Sathish Vadhiyar. Globus Information Service

RFTRFT

Client submits SOAP description of data Client submits SOAP description of data transfer jobtransfer job

Maintains checkpoints in data basesMaintains checkpoints in data bases

Supports both “push” and “pull” Supports both “push” and “pull” mechanismsmechanisms

Page 36: Globus – Part II Sathish Vadhiyar. Globus Information Service

Data Grid Replica ServicesData Grid Replica Services

Need for meta-data servicesNeed for meta-data services

Various kinds:Various kinds: Application metadataApplication metadata Replica metadataReplica metadata System configuration metadataSystem configuration metadata

Replica managementReplica management For better performance or availability to accessesFor better performance or availability to accesses Mainly for access to “published” resources – read-Mainly for access to “published” resources – read-

only modelonly model

Page 37: Globus – Part II Sathish Vadhiyar. Globus Information Service

Replica CatalogReplica Catalog

Provide mappings between logical names for file or collections and Provide mappings between logical names for file or collections and one or more copies of those objects on physical systemsone or more copies of those objects on physical systemsServices provided by replica catalog:Services provided by replica catalog:

Registering a list of files as a logical collectionRegistering a list of files as a logical collection Registering the physical location of a complete or partial replica of a Registering the physical location of a complete or partial replica of a

logical collectionlogical collection Registering information about a particular logical file in a logical Registering information about a particular logical file in a logical

collectioncollection Modifying the contents of registered entities of the catalogModifying the contents of registered entities of the catalog Responding to queries of the catalogResponding to queries of the catalog

The Globus Replica Catalog supports replica management by The Globus Replica Catalog supports replica management by providing mappings between logical names for files and one or more providing mappings between logical names for files and one or more copies of the files on physical storage systems copies of the files on physical storage systems