Upload
jasmine-king
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
A Metadata Catalog Service for Data Intensive Applications
Presented by Chin-Yi Tsai
2
Outline
Introduction
The Role of Metadata Services in Grid Data Management
Requirements for the Metadata Service
Components of a Metadata Service
MSC: A Metadata Catalog Service for Grids
Application Experiences
Scalability of the MCS
summary
3
Data-intensive application
Experimental analyses Simulation in scientific disciplines
Massive datasets are shared by a community of hundreds or thousands of researchers
Purpose To manage these large data sets efficiently
Metadata or descriptive information about the data needs to be managed
4
High Level Diagram of the Metadata Catalog Architecture
Client ApplicationClient Application
Web Server Database Connectivity
Web Server Database Connectivity
Metadata Database(MySQL)
Metadata Database(MySQL)
Standard interface
Metadata Catalog Service
5
Introduction
Metadata is information that describes data.
Design of a Metadata Catalog ServiceMetadata Catalog Service (MCS) that provides a mechanism for storing and accessing descriptive metadata and allows users to query for data items based on desired attribtues.
Accurate identification of desired data items is essential for correct analysis of experimental and simulation results.
6
Introduction (cont’d)
There are various types of metadata. Replication metadata Describe the contents of data items Relate to the physical characteristics of data objects, such as
size, access permission.
Distinguish between logical file metadata and physical file metadata.
logical file metadata physical filequery
7
A usage scenario of the Metadata Catalog Service
ClientApplication
Physical Storage System
Replical Location Service
Metadata Catalog Service
MCSWeb Server
MCSDatabase
Replica Index Node
Local Replica Cat.
1
2
3
45
6
8
Metadata types
User MetadataUser Metadata
Virtual Organization MetadataVirtual Organization Metadata
Domain-Specific MetadataDomain-Specific Metadata
Domain-Independent MetadataDomain-Independent Metadata
Physical MetadataPhysical Metadata
Metadata Types
Information about the characteristics of data on physical
storage system
Regardless of the application domain or virtual organization in which the data sets are created
and shared.
Specific to an application domain, a virtual organization or to particular
user
9
The Role of Metadata Services in Grid Data Management
Medata Services as services that maintain mappings between logical name attributes for data items and other descriptive metadata attributes and respond to queries about those mappings.
Metadata Services play a key role
in the publication and the discovery
and access of data sets.
10
Publication
Publication is the process by which data sets and their associated attributes are stored and made acessible to a user community. Domain-independent, domain-dependent, and virtual
organization metadata attributes To discover and access according to attributes
Some members of the community may use the Metadata Service to annotate the data sets with their own observations using user attributes and make these annotations available to a controlled subset of the community.
11
Discovery and Access
Discovery is the process of identifying data items of interest to the user.
ClientApplication
Physical Storage System
Replical Location Service
Metadata Catalog Service
MCSWeb Server
MCSDatabase
Replica Index Node
Local Replica Cat.
1
2
3
45
6
12
Requirements for the Metadata Service
Metadata Service must provide a mechanism for associating logical name attributes with domain-independent metadata attributes.
The Metadata Service must support queries on its contents.
The Metadata Service must implement policies regarding the consistency guarantees, authentication, authorization, and auditing capabilities provided by the service.
13
Requirements for the Metadata Service
The Metadata Service may support the ability to aggregate metadata into collections or views by associating aggregation attributes with logical name attributes.
The Metadata should provide the ability to store attributes that describe the record the transformations ona dataset.
The Metadata Service should provide good performance and scalability.
14
Components of a Metadata Service
A data model that includes mechanisms for aggregation of metadata mappings
A standard schema for domain-independent metadata attributes with extensibility for additional user-defined attributes
A set of standard service behaviors
Query mechanisms for accessing the database
A set of standard interfaces and APIs for storing and accessing metadata
A set of policies for consistency, access control and authorization, and auditing
15
MCS : A Metadata Catalog Service for Grids (design and implementation) The MCS data model
MCS Schema
MCS service implementation
MCS Query mechanism and APIs
MCS policies
17
The MCS Data Model
Logical file Logical collection Logical view
18
MCS Schema Logical file metadata
main attributes of a logical file Logical collection metadata
user-defined associations of logical files Logical view metadata
user-defined aggregation of logical files, logical collections or other logical views
Authorization information is associated the both individual logical files and logical collections
User informationUser information Audit metadataAudit metadata User-defined metadataUser-defined metadata Annotation attributesAnnotation attributes Creation history Creation history External catalog metadataExternal catalog metadata
field name type remarks description
Data_id Integer Non null The data identifier
Logical_name Varchar(250) Non null The logical file name
Version Integer The version of the daat
Data_type Varchar(250) The type of data
Collection_id Integer
Container_id Integer
Container Service Varchar(250)
Is_valid Integer Non null
Creator_Dn Varchar(250) Non null
Last_Modifier_Dn Varchar(250)
Create_Time Date/Time Non null
Last_Modify_Time Date/Time
Master_Copy Varchar(250)
Logical file metadata
Logical collection metadata
Logical view metadata
Authorization information
User information About writers or modicifers of the logical files in the database
Audit metadata Record information about actions that can be performed on the Metadata
Service
User-defined metadata Different application domains have their own metadata schemas
Annotation attributes comments
Creation history Information about how data items are geneated
External catalog metadata Use this information to further query the external catalog
24
MCS Service Implementation
Application Program
Main() { mcsClient( ); mcsCreate( x );}
MCS Client
SOAP Engine SOAP Engine
MCS Server MySQL Database
Overview of the Implementation
25
MCS Query Mechanisms and APIs
The client API provides the following operations: Querying the catalog for logical objects based on object
attributes Querying the static attributes of a logical object Querying the user defined attributes of a logical object Querying the contents of a logical view or a logical collection Creating a logical file, collection or a view Modifying the attributes of a logical object Deleting a logical file, view or a collction Annotating a logical object Adding logical objects to view
26
MCS Policies
The MCS provides authentication and authorization capabilities on the logical files and logical colleciton attributes in MCS
The MCS provides auditing metadata Creation information log
To support other services Such as replica managers that maintain consistency among data
items
27
Application Experiences
To intergrate MCS into the software used by these applications The Pegasus/LIGO Application The Earth System Grid Application
28
The Pegasus/LIGO Application
Pegasus is used to map complex application workflows onto the available Grid resources
Pegasus uses MCS to discover existing application data products.
Pegasus uses the MCS and Replication Location Service MCS only stores logical file names
Attributes that describe these data products, including the type of the data and the duration of data measurements, are stored in the MCS.
23 user defined attributes
ClientApplication
Physical Storage System
Replical Location Service
Metadata Catalog Service
MCSWeb Server
MCSDatabase
Replica Index Node
Local Replica Cat.
1
2
3
45
6
29
The Earth System Grid Application
The MCS is one component in an ESG testbed
ESG scientists use the MCS to discover and query for ESG files based on metadata attributes
30
Scalability of the MCS
Database size Logical collection Logical file User defined
100,000 100 1000 10
1,000,000 1000 1000 10
5,000,000 5000 1000 10
Add and query operations
31
Scalability of the MCS
With web interface
Web service overhead
32
Scalability of the MCS
33
Scalability of the MCS
34
Scalability of the MCS
35
Scalability of the MCS
36
Scalability of the MCS
37
Scalability of the MCS
38
Summary
The design and implementation of a MCS
Store, access, and query
To make the service more extensible and to provdie a more general query model
Use of other database backnd technologies