38
Semantic Data Integration in Semantic Data Integration in myGrid and ourGrid (SEEK) myGrid and ourGrid (SEEK) National e-Science Centre e-Science Institute, Edinburgh May 14 th , 2004

Semantic Data Integration in myGrid and ourGrid (SEEK) National e-Science Centre e-Science Institute, Edinburgh May 14 th, 2004

Embed Size (px)

Citation preview

Semantic Data Integration Semantic Data Integration in myGrid and ourGrid in myGrid and ourGrid

(SEEK)(SEEK)

National e-Science Centree-Science Institute, Edinburgh

May 14th, 2004

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 2

Plan of the Day

• 9:00–10:30 – SEEK Data Integration & Semantic Extensions

• 10:30–11:00 BREAK• 11:00–12:30

– myGrid Data Integration & Semantic Extensions

• 12:30–13:45 LUNCH• 13:45–15:45

– Interoperable Semantic Registration, Mediation, Workflows

• 15:45–16:00 BREAK• 16:00–17:00

– Plenary Session

SEEK Data Integration & Semantic Extensions

Shawn Bowers (SDSC/UCSD)Bertram Ludaescher (SDSC/UCSD)

& SEEK KR-SMS Team& GEON KR Team

SparrowSparrow

http://seek.ecoinformatics.org

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 4

Purpose / Goals

• Link-Up: – … [on] data / services with “semantics”– … to do semantic data & service integration– also: an e-Science “Sister Project” to facilitate

knowledge exchange & collaboration between UK & US based projects (where is the web/wiki page?)

• Specifically:– What approaches to express semantics of data,

services, and workflows do we all use?– How can we make them interoperable?

• … keeping in mind…– What problem is it that the XYZ solution

solves?

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 5

Science Environment for Ecological Knowledge

• Domain Science Driver– Ecology (LTER), biodiversity,

…• Analysis & Modeling System

– Design & execution of ecological models & analysis

– End (&power) user focus– {application,upper}-ware Kepler

• Semantic Mediation System– Data Integration of hard-to-

relate sources and processes– Semantic Types and

Ontologies– upper middleware Sparrow Toolkit

• EcoGrid – Access to ecology data and

tools– {middle,under}-ware

one specific problem (DILS’04)

our focus

architecture

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 6

Heterogeneous Data integration

• Requires advanced metadata and processing

– Attributes must be semantically typed– Collection protocols must be known– Units and measurement scale must be known– Measurement relationships must be known

• e.g., that ArealDensity=Count/Area

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 7

A Neuroscientist’s Information Integration Problem

What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?

How about other rodents?

??Information Information IntegrationIntegration

protein localizationprotein localization(NCMIR)(NCMIR)

neurotransmissionneurotransmission(SENSELAB)(SENSELAB)

sequence infosequence info(CaPROT)(CaPROT)

morphometrymorphometry(SYNAPSE)(SYNAPSE)

““Complex Complex Multiple-Worlds”Multiple-Worlds”

MediationMediation

Biomedical InformaticsResearch Networkhttp://nbirn.net

A Home Buyer’s Information Integration Problem

What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood

with below-average crime rate and diverse population?

??Information Information IntegrationIntegration

RealtorRealtor DemographicsDemographicsSchool RankingsSchool RankingsCrime StatsCrime Stats

““Multiple-Worlds”Multiple-Worlds”MediationMediation

An Online Shopper’s Information Integration Problem

El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?”

?Information Integration

addall.com

““One-World”One-World”MediationMediation

amazon.comamazon.com A1books.comA1books.comhalf.comhalf.combarnes&noble.combarnes&noble.com

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 11

Standard (XML-Based) Mediator Architecture

MEDIATORMEDIATOR

(XML) Queries & Results

S1

Wrapper

(XML) View

S2

Wrapper

(XML) View

Sk

Wrapper

(XML) View

Integrated Global(XML) View G

Integrated ViewDefinition

G(..) S1(..)…Sk(..)

USER/ClientUSER/Client

Query Q ( G (SQuery Q ( G (S11,..., S,..., Skk) )) )

wrappers implementedas web services

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 12

Information Integration: Problems and “Solutions”• System aspects: “Grid” Infrastructure

– Authentication, single sign-on, …– distributed computation – web wervices, WSDL/SOAP, …– sources = functions, files, databases,

• Syntax & Structure: (XML-Based) Database Mediators– wrapping, restructuring – distributed (XML) queries and views– sources = (XML) databases

• Semantics: Model-Based/Semantic Mediators– conceptual models, declarative views – ontologies, description logics (OWL, RDF,…)– sources = knowledge bases

(DB+CMs+ICs)

SyntaxSyntax

StructureStructure

SemanticsSemantics

System aspectsSystem aspects

• reconciling reconciling SS44 heterogeneitiesheterogeneities

• ““gluing” together gluing” together multiple data sources multiple data sources

• bridging inforbmation bridging inforbmation and knowledge gaps and knowledge gaps computationallycomputationally

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 13

Exercise:Classify (system, syntax, structure, semantics,

sth else …)

• “9:00” vs “9am” vs “21:00” vs “9 ct”

• “3 miles” (land|sea) (here UK|US|elsewhere) (now|elsewhen) …

• “picea rubens” (name vs concept … in biological taxonomies)

• …

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 14

Different Types of “Ontologies” and Representations

• Overloaded/sloppy for a…– “Napkin drawing”, “concept space” (e.g. in PPT)– Labeled graph, semantic network, concept map (e.g. in

RDF) – Controlled vocabulary(structured or flat)– Database schema (relational, XML, …)– Conceptual schema (ER, UML, … )– Thesaurus (synonyms, broader term/narrower term)– Taxonomy– Formal ontology, e.g., in [Description] Logic (e.g. in OWL)

• “formalization of a specification”

• An ontology may … – constrain possible interpretation of terms– specify a theoryspecify a theory by definingdefining and relatingrelating conceptsconcepts of a

domain of interest • theory = set of logic models logic models (=“allowed/intented (=“allowed/intented

intepretations” of symbolsintepretations” of symbols)

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 15

Community-Based Ontology Development

• Draft of a geochemistry ontology developed by scientists

Current concept maps and emerging ontologies in

GEON:1. Igneous Rocks/Plutons2. Seismology3. Geochemistry

• … in SEEK: 1. Taxon2. Units3. Measurements4. …?

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 16

Creating and Sharing Concept Maps (here: Seismology concept map, Cmap tool; Kai Lin,

GEON)

• Lock up scientists for 2+ days• Add CS/KRDB types• Create concept maps• Refine• Iterate from napkin drawings, to

concept maps, to ontologies

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 17

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 18

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 19

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 20

Graph (RDF) Queries on Ontologies

visualization

RQL Query:Show all “products”

Query ResultsPrototype:

Kai Lin, GEON

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 21

Ontologies: Qui bono?

• What are ontologies used for? – Conceptual models of a domain or application,

(communication means, system/database design, …)– Classification of …

• concepts (taxonomy) and • data/object instances based on properties and concept

definitions

– Analysis of ontologies e.g.• Graph queries (reachability, path queries, …)• Reasoning (concept subsumption, consistency checking, …)

– Targets for semantic data registration– Conceptual indexes and views for “smart” operations

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 22

Using ontologies for …

• Smart data discovery• Smart service discovery• Smart (data) querying• Smart data integration (declarative)• Smart workflow planning (execution !?)

(procedural)

• Here: def_macro “smart” := (ontology|semantics) – (based|enhanced|enabled)

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 23

Specifically in SMS ..

• “smart data discovery” – e.g., …– asking for A, retrieve B’s too, since B isa A

• “smart connections” – e.g., …– data/source binding to AMS (Kepler) services

(actors)– service-to-service semantic (and structural?) type

checking– service-to-service & data-to-service “gluing”

(insert structural transformations, unit conversions, suggest services based on parameter chasing (parameter ontologies)

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 24

… specifically in SMS .. (Cont’d)

• “smart data integration”– e.g., …– concept-based instance classification and data

enumeration (as part of integrated/mediated views)– discovery and use of new join relations across

sources– rewriting queries (against which SEEK/EcoGrid/EML

schemas??) using ontologies & integrity constraints– generation of feasible distributed query plans in the

presence of access patterns (web services), views, integrity constraints (ICs)

Need for “semantic registration/annotation” – Linking data structures/objects to conceptual

structures

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 25

Things to “Register”

• Data files (individual files)– Shapefile as a blob (+ file type)

• Collections (of files; nested; eg satellite data)

• Databases (has schema and can be queried)– Shapefile with schema registered

• Ontologies• Services (web + grid services)• Other/external applications

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 26

Ontologies and Data Management( watch out for Semantic Data Registration later)

Schema Schema Schema Schema

ConceptualModel

ConceptualModel

Ontology

Data

Metadata

DesignArtifact

use concepts from(explicitly or implicitly)

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 27

A Multi-Hierarchical Rock Classification “Ontology” (GSC)

Composition

Genesis

Fabric

Texture

Application Example: Geologic Map Integration

domainknowledge

domainknowledge

Knowledge r

epresentatio

n

Ontologies!?

NevadaNevada

+/- a few hundred million years

“Semantic Registration” of shapefiles to a shared ontology concept-based queries; also allows … … viewing of British-registered USGS data through Canadian eyes

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 29

Example: Smart Connections [DILS’04]

• Services can be semantically compatible, but structurally incompatible

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Incompatible

Compatible

(⋠)

(⊑)

(Ps)(Ps) (≺)

Ontologies (OWL)Ontologies (OWL)

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 30

Example: Smart Connections [DILS’04]

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Compatible ( )⊑

RegistrationMapping (Output)

RegistrationMapping (Input)

CorrespondenceCorrespondence

Generate (Ps)(Ps)

Ontologies (OWL)Ontologies (OWL)

Transformation

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 31

The Sparrow Toolkit (Origins)

• Annoyance with ugly, user-unfriendly XML syntaxes (e.g., OWL in XML, rules in XML, … anything in XML)– Note: others got annoyed too, but we didn’t know [OWL Concrete

Abstract Syntax, Bechhofer et al.]– (well, we knew about Triple, but that’s only RDF…)

• Instead use a lean syntax (how XML should have been)– owl employee isa person and worksfor some employer.– owl mother eqv person and female and hasChild some

person.– rdf john, worksfor, ‘IBM’.

– … are both human and machine readable– … in fact the language was invented around the corner…– … and this is the “parser”:

• :- op(1100, fx, owl), op(1100, fx, rdf), • :- op(600, xfx, isa), op(600, xfx, eqv).• :- op(550, xfy, or), op(500, xfy, and), op(350, fx, not).• :- op(400, xfy, some), op(400, xfy, only).

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 32

Sparrow (a poor man’s OWL tool …)

Simple ASCII-based RDF and OWL entry and manipulation

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 33

Sparrow Toolkit

• Much more than a lean syntax for OWL & RDF– Syntax transformation services:

• RDF, OWL, … Sparrow RDF, OWL, LaTeX, FO/LeanTap, …

– Semantic registration services• Semantic Annotation language

– Reasoning services• Classification, Consistency checking, Conversion, Query

rewriting, …

• Will be provided in Kepler – e.g., as actors, but also as type extensions

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 34

Sparrow: The Name

• “A poor man’s OWL”– or how XML really should look like

• “Lieber den Spatz in der Hand als die Taube auf dem Dach” – Better a sparrow in the hand than a pigeon/dove on the

roof

• Also: In Memoriam:

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 35

In Memoriam: Dusky Seaside Sparrow

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 36

Some work in progress …

[short-paperSSDBM’04]

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 37

References

• SMS:– An Ontology Driven Framework for Data Transformation in Scientific Workflows.

S. Bowers and B. Ludäscher. In International Workshop on Data Integration in the Life Sciences (DILS), LNCS, Leipzig, Germany, March 2004.

– On Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, and B. Ludäscher, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.

– Towards a Generic Framework for Semantic Registration of Scientific Data. S. Bowers and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003.

– Processing First-Order Queries under Limited Access Patterns, Alan Nash and B. Ludäscher, Proc. 23rd ACM Symposium on Principles of Database Systems (PODS'04) Paris, France, June 2004, to appear.

– Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns, Alan Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology (EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS.

– Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on Data Engineering (ICDE'04) Boston, IEEE Computer Society, April 2004.

– Teaching: Graduate Class: CSE-291 – Ontologies in Data and Process Integration: http://www.sdsc.edu/~ludaesch/CSE-291-Spring-04/ (Bertram; guest lectures by Shawn)

– …

… meets e-Sciencemeets e-Science, Edinburgh, May 9-11, 2004 38

References

• Kepler– Kepler: An Extensible System for Design and Execution of Scientific

Workflows, I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.

– Kepler: Towards a Grid-Enabled System for Scientific Workflows. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock In Workshop on Workflow in Grid Systems, Global-Grid Forum (GGF10), Berlin, Germany, March 2004.

– A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In Intl. Conference on Web Services (ICWS), San Diego, California, July 2004.

– Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher (presenter), Steve Mock, Workflow in Grid Systems (GGF10), Berlin, March 9th, 2004.

– Kepler/GEON User Manual, Efrat Jaeger. – The Computational Chemistry Prototyping Environment, Kim

Baldridge, Jerry Greenberg, Wibke Sudholt, Karan Bhatia, Stephen Mock, Ilkay Altintas, Cline Amoreira, Yohan Potier, Mucaehl Taufer

– …