9
Data R&D Issues for GTL Data R&D Issues for GTL Data and Knowledge Systems Data and Knowledge Systems San Diego Supercomputer Center San Diego Supercomputer Center University of California, San Diego University of California, San Diego Bertram Ludäscher Bertram Ludäscher [email protected] [email protected]

Data R&D Issues for GTL

  • Upload
    ganit

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Data R&D Issues for GTL. Bertram Ludäscher [email protected]. Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego. Data R&D Issues for GTL. GTL data management infrastructure Service-oriented D ata Grids for - PowerPoint PPT Presentation

Citation preview

Page 1: Data R&D Issues for GTL

Data R&D Issues for GTLData R&D Issues for GTL

Data and Knowledge SystemsData and Knowledge Systems

San Diego Supercomputer CenterSan Diego Supercomputer Center

University of California, San DiegoUniversity of California, San Diego

Bertram LudäscherBertram Ludä[email protected]@sdsc.edu

Page 2: Data R&D Issues for GTL

Data R&D Issues for GTLData R&D Issues for GTL GTL data management infrastructureGTL data management infrastructure Service-oriented Data GridsService-oriented Data Grids for for

Seamless data sharing (volume, distribution, access restrictions, …) Capabilities for data integration (mediators/warehouses), digital library functions, knowledge-based

(“semantic”) extensions (e.g. ontologies), and archival capabilities Data analysis and knowledge-enabling infrastructureData analysis and knowledge-enabling infrastructure

Analytical PipelinesAnalytical Pipelines (“ (“Scientific WorkflowsScientific Workflows”)”) Rapid design and prototyping, handling of complex data & task semantics, large volume, sci. workflow as

a first-class product, validation, execution, monitoring, sharing, archiving How to go from a scientist’s abstract (conceptual) workflow to a data grid execution plan?

New Model Management and Knowledge Representation Technologies New Model Management and Knowledge Representation Technologies :: Closing the gap between data management (DBMS’s, data grids) and knowledge-based systems (desktop-

oriented, rule-based systems) and analysis and modeling systems Mapping between numerous formalisms at the syntactic, structural, and semantic level (terminological,

process-semantics, …) “Gluing” together models and formalisms across different levels: from genes to proteins to molecular

machines to microbial communities…(compare: pnp transistors, boolean circuits, assembly language, high-level PLs , declarative QLs, … ) abstraction & elaboration mechanisms

Data exploration and hypothesis generation tools (KNOW-ME, SKIDL, SEEK AMS, …) Computational facilitiesComputational facilities

Use of high-end networked facilities a la Use of high-end networked facilities a la TeraGridTeraGrid Opportunities (and challenges!) in leveraging related efforts:Opportunities (and challenges!) in leveraging related efforts:

NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, …NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, … Standardization Standardization (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference

mechanisms, …), scientific workflow standards, …mechanisms, …), scientific workflow standards, … interoperable, open source tools interoperable, open source tools One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive” One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive”

(capturing implicit domain knowledge, hidden assumptions, …)(capturing implicit domain knowledge, hidden assumptions, …)

Page 3: Data R&D Issues for GTL

Bonus MaterialBonus Material (beyond 1 slide limit ;-) starts here …(beyond 1 slide limit ;-) starts here …

Page 4: Data R&D Issues for GTL

Up & Down: Abstraction & Elaboration MechanismsUp & Down: Abstraction & Elaboration Mechanisms

KnowledgeMgmt

Information Mgmt

Data Management

How to punch through the technology barriers?• Data Grids • vs Digital Libraries • vs DBMS’s • vs Knowledge-Based Analysis & Modeling Systems

Page 5: Data R&D Issues for GTL

Biomedical Informatics Research NetworkBiomedical Informatics Research Network

Page 6: Data R&D Issues for GTL

Biomedical InformaticsResearch Networkhttp://nbirn.net

Biomedical InformaticsResearch Networkhttp://nbirn.net

Getting Formal: Source ContextualizationGetting Formal: Source Contextualization & Ontology Refinement in Logic & Ontology Refinement in Logic

Page 7: Data R&D Issues for GTL

Scientific Data IntegrationScientific Data Integration ... Questions to Queries ...... Questions to Queries ...

What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ?

How does it relate to host rock structures?

?Information Integration

Geologic Map(Virginia)

GeoChemicalGeoPhysical

(gravity contours)GeoChronologic

(Concordia)Foliation Map(structure DB)

“Complex Multiple-Worlds”

Mediation

domain knowledge

Database mediationData modeling

Knowledge Representation:ontologies, concept spaces

raw data

GeoSciences Network

Page 8: Data R&D Issues for GTL

Geologic Map Integration: Geo & IT/CS meetGeologic Map Integration: Geo & IT/CS meet

domainknowledge

domainknowledge

Knowledge r

epresentatio

n

AGE ONTOLOGY

NevadaNevada

Geoscientists + Computer Scientists Igneous Geoinformaticists+/- Energy

GEON Metamorphism Equation:

+/- a few hundred million years

Page 9: Data R&D Issues for GTL

Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,..Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,.. ““Analysis & Modeling SystemAnalysis & Modeling System” to design, execute, reproduce/refine scientific ” to design, execute, reproduce/refine scientific

workflows in the ecology and biodiversity domains.workflows in the ecology and biodiversity domains.

   SEEK Project

Overview

ASx ASy ASzTS1TS2

Semantic MediationEngine

Data Binding

Query Processing

ECO2

Logic Rules ECO2-CL

Analytical Pipeline (AP)

SMS: SemanticMediation System

EcoGrid

provides unified access to Distributed Data Stores , Parameter Ontologies, & Stored Analyses, and runtime capabilities via the Execution Environment

Semantic Mediation System & Analysis and Modeling System use EcoGrid web services, enabling analytically driven data discovery and integration

SEEK is the combination of EcoGrid data resources and information services, coupled with advanced semantic and modeling capabilities

AM: Analysis and Modeling System

ASr

Parameters w/ Semantics

CC

C

CC

CParameterOntologies

WSDL WSDL

SRB KNB

MC

Species

WrpDar

...

Raw data setswrappedfor integrationw/ EML, etc.

ECO2 TaxOn

EML

etc.

Execution Environment

SAS, MATLAB,FORTRAN, etc

Library of Analysis Steps, Pipelines& Results

Invasive speciesover time

ASr

WSDL

Example of “AP0”

AP0