Upload
ashton-franklin
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
http://www.ogsadai.org.uk
OGSA-DAIData Access and Integration for the Grid
Neil Chue Hong
2http://www.ogsadai.org.uk
Motivation Goals Partners Features Projects Further information Overview and demo of FirstDIG/INWA
Overview
3http://www.ogsadai.org.uk
OGSA-DAI Motivation
Entering an age of data– Data Explosion
• CERN: LHC will generate 1GB/s = 10PB/y• VLBA (NRAO) generates 1GB/s today• Pixar generate 100 TB/Movie
– Storage getting cheaper Data stored in many different ways
– Data resources• Relational databases• XML databases• Flat files
Need ways to facilitate – Data discovery– Data access– Data integration
Empower e-Business and e-Science– The Grid is a vehicle for achieving this
4http://www.ogsadai.org.uk
Goals for OGSA-DAI
Aim to deliver application mechanisms that:– Meet the data requirements of Grid applications
• Functionally, performance and reliability
• Reduce development cost of data centric Grid applications
• Provide consistent interfaces to data resources
– Acceptable and supportable by database providers• Trustable, imposed demand is acceptable, etc.
• Provide a standard framework that satisfies standard requirements
A base for developing higher-level services– Data federation– Distributed query processing– Data mining– Data visualisation
5http://www.ogsadai.org.uk
Integration Scenario
A patient moves hospital
DB2 OracleCSVfile
A: (PID, name, address, DOB) B: (PID, first_contact) C: (PID, first_name, last_name, address, first_contact, DOB)
Data A Data B
Data C
Amalgamated patient record
6http://www.ogsadai.org.uk
Why OGSA-DAI?
Why use OGSA-DAI over JDBC?– Language independence at the client end
• Do not need to use Java
– Platform independence• Do not have to worry about connection technology and drivers
– Can handle XML and file resources– Can embed additional functionality at the service end
• Transformations, Compression, Third party delivery• Avoiding unnecessary data movement
– Provision of Metadata is powerful– Usefulness of the Registry for service discovery
• Dynamic service binding process
– The quickest way to make data accessible on the Grid• Installation and configuration of OGSA-DAI is fast and
straightforward
7http://www.ogsadai.org.uk
Project Partners
Powered by ….
Funded by the Grid Core ProgrammeOGSA-DAI£3 million, 18 months, from Feb 2002
Three major releases, three interim releases
DAIT (DAI-Two)Keep the OGSA-DAI brand name£1.5 million, 24 months, from Oct 2003Four major releases
GGF DAIS WGStrong involvement.Standardise the interfaces
OGSA-DAI to be a reference implementation
8http://www.ogsadai.org.uk
Core features
An extensible framework for building applications– Supports relational, xml and some files
• MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV, EMBL
– Supports various delivery options• SOAP, FTP, GridFTP, HTTP, files, email, inter-service
– Supports various transforms• XSLT, ZIP, GZip
– Supports message level security using X509 certificates– Client Toolkit library for application developers– Comprehensive documentation and tutorials
Third production release is coming in November– OGSI/GT3 based– Also previews of WS-I and WS-RF/GT4 releases
9http://www.ogsadai.org.uk
Activities are the drivers
Express a task to be performed by a GDS Three broad classes of activities:
– Statement– Transformations– Delivery
Extensible:– Easy to add new functionality– Does not require modification to the service interface– Extension operate within the OGSA-DAI framework
Functionality:– Implemented at the service– Work where the data is (do not require to move data back)
11http://www.ogsadai.org.uk
Client Toolkit
Why? Nobody wants to write XML! A programming API which makes writing
applications easier– Now: Java– Next: Perl, C, C#?, ML!?
// Create a querySQLQuery query = new SQLQuery(SQLQueryString);ActivityRequest request = new ActivityRequest();request.addActivity(query);
// Perform the queryResponse response = gds.perform(request);
// Display the resultResultSet rs = query.getResultSet();displayResultSet(rs, 1);
13http://www.ogsadai.org.uk
e-Digital MammOgraphy National Database Built a prototype of a national database of
mammographic images in support of the UK Breast screening programme
Employ Grid technologies to facilitate this process
14http://www.ogsadai.org.uk
DB2 ContentManager
DB2 ContentManager
DB2 ContentManager
DB2 ContentManager
DB2 Federation
OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI
Database Files
OGSA-DAI
Core Services
Core Services
Core Services
Core Services
DataLoad
TrainingApp
TrainingServices
UCLKCL UEDCHU
CoreAPI
TrainingAPI
TrainingApplication
Core & Training API
OGSA-DAI
DataLoad
TrainingApp
Core & Training API
DataLoad
TrainingApp
Core & Training API
DataLoad
TrainingApp
Core & Training API
16http://www.ogsadai.org.uk
GeneGrid
Grid Based Framework for Bioinformatics – Virtual Bioinformatics Laboratory– Integration of Existing Technologies & Data Sets– Gene Study in Silico– Develop Specialist Data Sets– Grid Services for Commercial or 3rd Party Use
Data resources as XML collections (XIndice), flat files and relational databases (MySQL)– OGSA-DAI plus custom extensions– Beta testers for file based activities
http://www.qub.ac.uk/escience/projects/genegrid/
18http://www.ogsadai.org.uk
Distributed Query Processing
Queries mapped to algebraic expressions for evaluation
Parallelism represented by partitioning queries – Use exchange operators
Prototype available from:– http://www.ogsadai.org.uk
table_scan(protein)
table_scantermID=S92(proteinTerm)
reduce
reduce
hash_join(proteinId)
op_call(Blast)
reduce
exchange
exchange
3,4
1 2
19http://www.ogsadai.org.uk
GridMiner
Test application area: medical– traumatic brain injury treatment– Predicting the outcome of seriously ill patients– analytical part focuses on data mining and On-Line
Analytical Processing (OLAP)
Target: – provide tools to discover and access relevant
knowledge and information from different distributed and heterogeneous data sources
– building on and extending OGSA-DAI
http://www.gridminer.org/
20http://www.ogsadai.org.uk
GridMiner Scenario
Heterogeneities:– Name in A is „First Last“ (as the target format)– Name in C has to be combined
Distribution:– 3 data sources
21http://www.ogsadai.org.uk
Future work
Architecture review– better concurrency model– better AAA framework– better definition of extensibility points
• security, activities, dynamic configuration, mobile code,…
Improved support for– WS Security profiles– Stored procedures– Data transport– XQuery– Database specific datatypes and SQL
Additionally– JDBC and ODBC driver for OGSA-DAI– Contribution process
22http://www.ogsadai.org.uk
Further information
The OGSA-DAI Project Site:– http://www.ogsadai.org.uk
The DAIS-WG site:– http://forge.gridforum.org/projects/dais-wg/
OGSA-DAI Users Mailing list– [email protected]– General discussion on grid DAI matters
Formal support for OGSA-DAI releases– http://www.ogsadai.org.uk/support– [email protected]
OGSA-DAI training courses
23http://www.ogsadai.org.uk
Project Membership
Principal Investigators
Project Manager
Programme Management Board Chair
Technical Review Board Chair
Research Team
IBM Dissemination TeamEPCC Team
Charaka TomMike Ally AmyMario
Malcolm
Kostas
Norman Paul
Neil
Andy Simon Dave PatrickNeil
IBM Development Team
24http://www.ogsadai.org.uk
The End
Questions?