Upload
jane-sullivan
View
20
Download
4
Tags:
Embed Size (px)
DESCRIPTION
UC DAVIS Department of Computer Science. San Diego Supercomputer Center. Scientific Data & Workflow Engineering Preliminary Notes from the Cyberinfrastructure Trenches. Associate Professor Dept. of Computer Science & Genome Center University of California, Davis Fellow - PowerPoint PPT Presentation
Citation preview
Scientific Data & Workflow Scientific Data & Workflow
EngineeringEngineeringPreliminary Notes from the Cyberinfrastructure Preliminary Notes from the Cyberinfrastructure
TrenchesTrenches
Bertram Bertram LudLudääscherscher
San Diego Supercomputer Center
Associate ProfessorAssociate ProfessorDept. of Computer Science & Genome Dept. of Computer Science & Genome
CenterCenterUniversity of California, DavisUniversity of California, Davis
FellowFellow
San Diego Supercomputer CenterSan Diego Supercomputer CenterUniversity of California, San DiegoUniversity of California, San Diego
UC DAVISDepartment ofComputer Science
2 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
OutlineOutline
• Introduction: CI Sample ArchitecturesIntroduction: CI Sample Architectures
• Scientific Data IntegrationScientific Data Integration
• Scientific Workflow ManagementScientific Workflow Management
• Links & Crystallization PointsLinks & Crystallization Points
• Lessons learnt & SummaryLessons learnt & Summary
3 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Science Environment for Science Environment for Ecological Knowledge (SEEK) Ecological Knowledge (SEEK)
OverviewOverview• Domain Science DriverDomain Science Driver– Ecology (LTER), biodiversity, …
• Analysis & Modeling SystemAnalysis & Modeling System– Design & execution of ecological
models & analysis (“scientific workflows”)
– {application,upper}-ware Kepler system
• Semantic Mediation SystemSemantic Mediation System– Data Integration of hard-to-relate
sources and processes– Semantic Types and Ontologies– upper middleware Sparrow Toolkit
• EcoGridEcoGrid – Access to ecology data and tools– {middle,under}-ware unified API to SRB/MCAT,
MetaCat, DiGIR, … datasetssample CS problem [DILS’04]
4 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Common CI Infrastructure PiecesCommon CI Infrastructure Pieces
• Other CI-projects (e.g. GEON, … ) have Other CI-projects (e.g. GEON, … ) have similar service-oriented architectures: similar service-oriented architectures: – Seamless and uniform data access (“Data-Grid”)
• data & metadata registry
– distributed and high performance computing platform (“Compute-Grid”)
• service registry
– Federated, integrated, mediated databases• often use of semantic extensions (e.g. ontologies)
– User-friendly workbench / problem-solving environment scientific workflows
• add to this sensors, observing systems … add to this sensors, observing systems …
5 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
… … Example: Realtime Example: Realtime Environment for Analytical Environment for Analytical Processing Processing (REAP vision)(REAP vision)
6 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
The Great Unified SystemThe Great Unified System
• Many engineering and CS challenges!Many engineering and CS challenges!… we’ll see some …
• Our focus:Our focus:– Scientific data integration
• How to associate, mediate, integrate complex scientific data?
– Scientific workflows• How to devise larger scientific workflows for process
automation from individual components (e.g. web services)?
• Disclaimer:Disclaimer:… often scratching the surface; see references &
research literature for details …
7 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
OutlineOutline
• Introduction: CI Sample ArchitecturesIntroduction: CI Sample Architectures
• Scientific Data IntegrationScientific Data Integration
• Scientific Workflow ManagementScientific Workflow Management
• Links & Crystallization PointsLinks & Crystallization Points
• Lessons learnt & SummaryLessons learnt & Summary
8 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
An Online Shopper’s Information Integration An Online Shopper’s Information Integration ProblemProblem
El Cheapo: “Where can I get the cheapest copy (including shipping cost) of El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” Wittgenstein’s Tractatus Logicus-Philosophicus within a week?”
??Information Information IntegrationIntegration
addall.comaddall.com
“One-World”Mediation
amazon.comamazon.comamazon.comamazon.com A1books.comA1books.comA1books.comA1books.comhalf.comhalf.comhalf.comhalf.combarnes&noble.combarnes&noble.combarnes&noble.combarnes&noble.com
Mediator (virtual DB)Mediator (virtual DB)(vs. Datawarehouse)(vs. Datawarehouse)NOTE: non-trivial NOTE: non-trivial
data engineering challenges!data engineering challenges!
9 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
A Home Buyer’s Information Integration A Home Buyer’s Information Integration ProblemProblem
What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood
with below-average crime rate and diverse population?
?Information Integration
RealtorRealtor DemographicsDemographicsSchool RankingsSchool RankingsCrime StatsCrime Stats
“Multiple-Worlds”Mediation
10 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
A Neuroscientist’s A Neuroscientist’s Information Integration Information Integration
ProblemProblemWhat is the cerebellar distribution of rat proteins with more than
70% homology with human NCS-1? Any structure specificity?How about other rodents?
?Information Integration
protein localization(NCMIR)
neurotransmission(SENSELAB)
sequence info(CaPROT)
morphometry(SYNAPSE)
“Complex Multiple-Worlds”
Mediation
Biomedical InformaticsBiomedical InformaticsResearch NetworkResearch Networkhttp://nbirn.nethttp://nbirn.net
Inter-source links:• unclear for the non-scientists• hard for the scientist
11 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
12 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Interoperability & Integration Interoperability & Integration ChallengesChallenges
• System aspects: “Grid” Middleware• distributed data & computing, SOA• web services, WSDL/SOAP, WSRF, OGSA, …• sources = functions, files, data sets …
• Syntax & Structure: (XML-Based) Data Mediators
• wrapping, restructuring • (XML) queries and views• sources = (XML) databases
• Semantics: Model-Based/Semantic Mediators
• conceptual models and declarative views • Knowledge Representation: ontologies,
description logics (RDF(S),OWL ...)• sources = knowledge bases (DB+CMs+ICs)
• Synthesis: Scientific Workflow Design & Execution
• Composition of declarative and procedural components into larger workflows
• (re)sources = services, processes, actors, …
reconciling reconciling SS55 heterogeneitiesheterogeneities
““gluing” together gluing” together resources resources
bridging information and bridging information and knowledge gaps knowledge gaps computationallycomputationally
13 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Information Integration Challenges: Information Integration Challenges: SS44 Heterogeneities Heterogeneities• SSystemystem aspects aspects
– platforms, devices, data & service distribution, APIs, protocols, … Grid middleware technologies + e.g. single sign-on, platform independence, transparent use of remote
resources, …
• SSyntaxyntax & & SStructuretructure– heterogeneous data formats (one for each tool ...)– heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files, …) – heterogeneous schemas (one for each DB ...) Database mediation technologies+ XML-based data exchange, integrated views, transparent query rewriting,
…
• SSemanticsemantics– descriptive metadata, different terminologies, “hidden” semantics
(context), implicit assumptions, … Knowledge representation & semantic mediation technologies+ “smart” data discovery & integration+ e.g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy
anyways!
14 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Information Integration Challenges: Information Integration Challenges: SS55 Heterogeneities Heterogeneities
• SSynthesisynthesis of applications, analysis tools, data & of applications, analysis tools, data & query components, … into “query components, … into “scientific workflowsscientific workflows” ” – How to make use of these wonderful things & put them
together to solve a scientist’s problem?
Scientific Problem Solving Environments Scientific Problem Solving Environments (PSEs)(PSEs) Portals,Workbench (“scientist’s view”)+ ontology-enhanced data registration, discovery,
manipulation+ creation and registration of new data products from
existing ones, … Scientific Workflow System (“engineer’s view”)+ for designing, re-engineering, deploying analysis pipelines
and scientific workflows; a tool to make new tools … + e.g., creation of new datasets from existing ones, dataset
registration, …
15 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Information Integration from a Information Integration from a Database Perspective Database Perspective
• Information Integration ProblemInformation Integration Problem– Given: data sources S1, ..., Sk (databases, web sites, ...)
and user questions Q1,..., Qn that can –in principle– be answered using the information in the Si
– Find: the answers to Q1, ..., Qn
• The Database Perspective: The Database Perspective: source = source = “database”“database” Si has a schema (relational, XML, OO, ...) Si can be queried define virtual (or materialized) integrated (or
global) view G over local sources S1 ,..., Sk using database query languages (SQL, XQuery,...)
questions become queries Qi against G(S1,..., Sk)
16 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
5. Post processing5. Post processing2. Query rewriting2. Query rewriting
Standard (XML-Based) Mediator Standard (XML-Based) Mediator ArchitectureArchitecture
MEDIATORMEDIATOR
Integrated Global(XML) View G
Integrated ViewDefinition
G(..) S1(..)…Sk(..)
USER/ClientUSER/Client
1. Query Q ( G (S1. Query Q ( G (S11,..., S,..., Skk) )) )
S1
Wrapper
(XML) View
S2
Wrapper
(XML) View
Sk
Wrapper
(XML) Viewweb services as wrapper APIs
3. Q1 Q2 Q33. Q1 Q2 Q3
4. {answers(Q1)} {answers(Q2)} {answers(Q3)}4. {answers(Q1)} {answers(Q2)} {answers(Q3)}
6. {answers(Q)}6. {answers(Q)}
17 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Query Planning in Data Query Planning in Data IntegrationIntegration
• GivenGiven: : – Declarative user query Q: answer(…) …G ...– … & { G … S … } global-as-view (GAV)– … & { S … G … } local-as-view (LAV)– … & { ic(…) … S … G… } integrity constraints (ICs)
• FindFind: : – equivalent (or minimal containing, maximal contained) query plan Q’: answer(…) … S …
query rewriting (logical/calculus, algebraic, physical levels)
• ResultsResults::– A variety of results/algorithms; depending on classes of
queries, views, and ICs: P, NP, … , undecidable– hot research area in core CS (database community)
18 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Scientific Data Integration Scientific Data Integration using using
Semantic ExtensionsSemantic Extensions
19 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
20 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Example: Geologic Map Example: Geologic Map IntegrationIntegration
• GivenGiven: : – Geologic maps from different state geological
surveys (shapefiles w/ different data schemas)– Different ontologies:
• Geologic age ontology (e.g. USGS)• Rock classification ontologies:
– Multiple hierarchies (chemical, fabric, texture, genesis) from Geological Survey of Canada (GSC)
– Single hierarchy from British Geological Survey (BGS)
• ProblemProblem::– Support uniform queries across all map – … using different ontologies– Support registration w/ ontology A, querying
w/ ontology B
21 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
IntegrationSchema
Schema Integration Schema Integration (“registering” (“registering” local schemas to the global schema)local schemas to the global schema)
Arizona
Colorado
Utah
Nevada
Wyoming
New Mexico
Montana E.
Idaho
Montana West
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
FormationFormation ……
AgeAge ……
…… FormationFormation
…… AgeAge
…… CompositionComposition
…… FabricFabric
…… TextureTexture
…… FormationFormation
…… AgeAge
…… CompositionComposition
…… FabricFabric
…… TextureTexture
ABBREV
PERIOD
PERIOD
NAME
PERIOD
TYPE
TIME_UNIT
FMATN
PERIOD
NAME
PERIOD
NAME
FORMATION
PERIOD
FORMATION
FORMATION
LITHOLOGY
LITHOLOGY
AGE
AGE
andesitic sandstone
Livingston formation
Tertiary-Cretaceous
Sources
Sources
22 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Multihierarchical Rock Classification Multihierarchical Rock Classification “Ontology” (Taxonomies) for “Thematic “Ontology” (Taxonomies) for “Thematic
Queries” (GSC)Queries” (GSC)
Composition
Genesis
Fabric
Texture
23 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Ontology-Enabled Application Example:Ontology-Enabled Application Example:Geologic Map IntegrationGeologic Map Integration
Show formations where AGE = ‘Paleozic’
(without age ontology)
Show formations where AGE = ‘Paleozic’
(without age ontology)
Show formations where AGE = ‘Paleozic’
(with age ontology)
Show formations where AGE = ‘Paleozic’
(with age ontology)
+/- a few hundred million years
domainknowledge
domainknowledge
Knowledge r
epresentatio
n
Geologic Age
ONTOLOGY
NevadaNevada
24 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Querying by Geologic Age … Querying by Geologic Age …
25 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Querying by Geologic Age: Querying by Geologic Age: ResultsResults
26 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Querying by Chemical Querying by Chemical Composition … (GSC) Composition … (GSC)
27 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Semantic Mediation Semantic Mediation (via “semantic (via “semantic registration” of schemas and ontology registration” of schemas and ontology
articulations)articulations)
• Schema elements and/or data values are Schema elements and/or data values are associated with concept expressions from associated with concept expressions from the target ontologythe target ontology conceptual queries “through” the ontology
• Articulation ontology Articulation ontology source registration to A, querying through B
• Semantic mediation: query rewriting w/ ontologiesSemantic mediation: query rewriting w/ ontologies
Database1
Database2
Ontology A
Ontology B
semanticregistration
ontology articulations
Concept-based(“semantic”)
queries
semanticregistration
28 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Different views on State Different views on State Geological MapsGeological Maps
29 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Sedimentary Rocks: BGS Sedimentary Rocks: BGS OntologyOntology
30 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Sedimentary Rocks: GSC Sedimentary Rocks: GSC OntologyOntology
31 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Implementation in OWL: Not only “for Implementation in OWL: Not only “for the machine” …the machine” …
32 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Source Contextualization Source Contextualization through Ontology Refinementthrough Ontology Refinement
In addition to registering (“hanging off”) data relative toexisting concepts, a source may also refine the mediator’s domain map...
sources can register new concepts at the mediator ...
33 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
OutlineOutline
• Introduction: CI Sample ArchitecturesIntroduction: CI Sample Architectures
• Scientific Data IntegrationScientific Data Integration
• Scientific Workflow ManagementScientific Workflow Management
• Links & Crystallization PointsLinks & Crystallization Points
• Lessons learnt & SummaryLessons learnt & Summary
34 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
What is a Scientific Workflow What is a Scientific Workflow (SWF)?(SWF)?
• GoalsGoals::– automate a scientist’s repetitive data management
and analysis tasks – typical phases:
• data access, scheduling, generation, transformation, aggregation, analysis, visualization
design, test, share, deploy, execute, reuse, … SWFs
35 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Promoter Identification Promoter Identification WorkflowWorkflow
Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)
36 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Source: NIH BIRN (Jeffrey Grethe, UCSD)Source: NIH BIRN (Jeffrey Grethe, UCSD)
37 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Ecology: GARP Analysis Ecology: GARP Analysis Pipeline for Invasive Species Pipeline for Invasive Species
PredictionPrediction
Training sample
(d)
GARPrule set
(e)
Test sample (d)
Integrated layers
(native range) (c)
Speciespresence &
absence points(native range)
(a)EcoGridQuery
EcoGridQuery
LayerIntegration
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Validation
MapGeneration
Integrated layers (invasion area) (c)
Species presence &absence points
(invasion area) (a)
Native range
predictionmap (f)
Model qualityparameter (g)
Environmental layers (native
range) (b)
GenerateMetadata
ArchiveTo Ecogrid
RegisteredEcogrid
Database
RegisteredEcogrid
Database
RegisteredEcogrid
Database
RegisteredEcogrid
Database
Environmental layers (invasion
area) (b)
Invasionarea prediction
map (f)
Model qualityparameter (g)
Selectedpredictionmaps (h)
Source: NSF SEEK (Deana Pennington et. al, UNM)Source: NSF SEEK (Deana Pennington et. al, UNM)
38 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
39 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Commercial & Open Source Commercial & Open Source Scientific “Workflow” (well Scientific “Workflow” (well DataflowDataflow) Systems) Systems
Kensington Discovery Edition from InforSense
Taverna
Triana
40 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
SCIRun: Problem Solving Environments SCIRun: Problem Solving Environments for Large-Scale Scientific Computingfor Large-Scale Scientific Computing
• SCIRun: PSE for interactive construction, debugging, and SCIRun: PSE for interactive construction, debugging, and steering of large-scale scientific computationssteering of large-scale scientific computations
• Component model, based on generalized dataflow Component model, based on generalized dataflow programmingprogramming
Steve Parker (cs.utah.edu)Steve Parker (cs.utah.edu)
41 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Ptolemy II
see!see!see!see!
try!try!try!try!
read!read!read!read!
Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/
42 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Why Ptolemy II (and thus KEPLER)?Why Ptolemy II (and thus KEPLER)?• Ptolemy II Objective:Ptolemy II Objective:
– “The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation.”
• Dataflow Process Networks w/ natural support for Dataflow Process Networks w/ natural support for abstractionabstraction, , pipeliningpipelining (streaming) (streaming) actor-orientation, actor-orientation, actor actor reusereuse
• User-OrientationUser-Orientation– Workflow design & exec console (Vergil GUI)– “Application/Glue-Ware”
• excellent modeling and design support• run-time support, monitoring, …• not a middle-/underware (we use someone else’s, e.g. Globus, SRB, …)• but middle-/underware is conveniently accessible through actors!
• PRAGMATICSPRAGMATICS– Ptolemy II is mature, continuously extended & improved, well-documented
(500+pp) – open source system– Ptolemy II folks actively participate in KEPLER
43 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
KEPLER/KEPLER/CSPCSP: : CContributors, ontributors, SSponsors, ponsors, PProjectsrojects(or loosely coupled (or loosely coupled CCommunicating ommunicating SSequential equential PPersons ;-)ersons ;-)
Ilkay Altintas Ilkay Altintas SDM, ResurgenceSDM, ResurgenceKim Baldridge Kim Baldridge Resurgence, NMIResurgence, NMI Chad Berkley Chad Berkley SEEKSEEK Shawn Bowers Shawn Bowers SEEKSEEKTerence Critchlow Terence Critchlow SDMSDM Tobin Fricke Tobin Fricke ROADNetROADNetJeffrey Grethe Jeffrey Grethe BIRNBIRNChristopher H. Brooks Christopher H. Brooks Ptolemy IIPtolemy II Zhengang Cheng Zhengang Cheng SDMSDM Dan Higgins Dan Higgins SEEKSEEKEfrat Jaeger Efrat Jaeger GEONGEON Matt Jones Matt Jones SEEKSEEK Werner Krebs, Werner Krebs, EOLEOLEdward A. Lee Edward A. Lee Ptolemy IIPtolemy II Kai Lin Kai Lin GEONGEONBertram Ludaescher Bertram Ludaescher SEEKSEEK, , GEONGEON, , SDMSDM, , ROADNet, BIRNROADNet, BIRNMark Miller Mark Miller EOLEOLSteve Mock Steve Mock NMINMISteve Neuendorffer Steve Neuendorffer Ptolemy IIPtolemy II Jing Tao Jing Tao SEEKSEEK Mladen Vouk Mladen Vouk SDMSDM Xiaowen Xin Xiaowen Xin SDMSDM Yang Zhao Yang Zhao Ptolemy IIPtolemy IIBing Zhu Bing Zhu SEEKSEEK ••••••
Ptolemy IIPtolemy II
www.kepler-project.orgwww.kepler-project.org
44 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
KEPLER: An Open CollaborationKEPLER: An Open Collaboration• Initiated by members from NSF SEEK and DOE SDM/SPA; now Initiated by members from NSF SEEK and DOE SDM/SPA; now
several other projects (GEON, Ptolemy II, EOL, Resurgence/NMI, several other projects (GEON, Ptolemy II, EOL, Resurgence/NMI, …)…)
• Open Source (BSD-style license)Open Source (BSD-style license)• Intensive Communications: Intensive Communications:
– Web-archived mailing lists– IRC (!)
• Co-development: Co-development: – via shared CVS repository– joining as a new co-developer (currently):
• get a CVS account (read-only)• local development + contribution via existing KEPLER member• be voted “in” as a member/co-developer
• Software & social engineeringSoftware & social engineering– How to better accommodate new groups/communities?– How to better accommodate different usage/contribution models
(core dev … special purpose extender … user)?
45 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Ptolemy II/KEPLER GUI (Vergil)Ptolemy II/KEPLER GUI (Vergil)“Directors” define the component interaction & execution semantics
Large, polymorphic component (“Actors”) and Directors libraries (drag & drop)
46 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
KEPLER/Ptolemy II GUI refinedKEPLER/Ptolemy II GUI refinedOntology based actor (service) and dataset
search
Result Display
47 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Web Services Web Services Actors Actors (WS Harvester)(WS Harvester)
12
3
4
“Minute-made” (MM) WS-based application integration• Similarly: MM workflow design & sharing w/o implemented
components
48 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Some Recent Actor AdditionsSome Recent Actor Additions
49 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
An “early” example: An “early” example: Promoter Identification Promoter Identification
SSDBM, AD 2003SSDBM, AD 2003
• Scientist models application as a “workflow” of connected components (“actors”)
• If all components exist, the workflow can be automated/ executed
• Different directors can be used to pick appropriate execution model (often “pipelined” execution: PN director)
50 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Reengineering a Geoscientist’s Mineral Classification Workflow
51 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Job Management (here: Job Management (here: NIMROD)NIMROD)
• Job management infrastructure in place• Results database: under development• Goal: 1000’s of GAMESS jobs (quantum mechanics) – Fall/Winter’04
52 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
ORB
53 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Rapid Web Service-based Prototyping Rapid Web Service-based Prototyping
(Here: ROADNet Command & Control Services for LOOKING Kick-Off Mtg)(Here: ROADNet Command & Control Services for LOOKING Kick-Off Mtg)
Source: Ilkay Altintas, SDM, NLADRROADNet: Vernon, Orcutt et al
Web services: Tony Fountain et al
Source: Ilkay Altintas, SDM, NLADRROADNet: Vernon, Orcutt et al
Web services: Tony Fountain et al
54 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
in KEPLER in KEPLER (w/ editable (w/ editable script)script)
Source: Dan Higgins, Kepler/SEEKSource: Dan Higgins, Kepler/SEEK
55 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
in KEPLER in KEPLER (interactive session)(interactive session)
Source: Dan Higgins, Kepler/SEEKSource: Dan Higgins, Kepler/SEEK
56 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Blurring Blurring Design (ToDo)Design (ToDo) and Execution and Execution
57 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Scientific Workflow ChallengesScientific Workflow Challenges
• Typical FeaturesTypical Features– data-intensive and/or compute-intensive– plumbing-intensive (consecutive web services won’t
fit)– dataflow-oriented– distributed (remote data, remote processing)– user-interaction “in the middle”, …– … vs. (C-z; bg; fg)-ing (“detach” and reconnect)– advanced programming constructs (map(f), zip,
takewhile, …)– logging, provenance, “registering back”
(intermediate) products…
58 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
hand-crafted control solution; also: forces sequential execution!
designed to fit
designed to fit
hand-craftedWeb-service
actor
Complex backward control-flow
No data transformations
available
[Altintas-et-al-PIW-SSDBM’03][Altintas-et-al-PIW-SSDBM’03]
59 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
A Scientific Workflow Problem: Solved A Scientific Workflow Problem: Solved (Computer Scientist’s view)(Computer Scientist’s view)
• Solution based on Solution based on declarative, functional declarative, functional dataflow process networkdataflow process network(= also a data streaming
model!)
• Higher-order constructs: Higher-order constructs: mapmap((ff) ) no control-flow spaghetti data-intensive apps free concurrent execution free type checking automatic support to go
from piw(GeneId) to PIW :=map(piw) over [GeneId]
map(f)-style
iterators Powerful type
checking Generic,
declarative “programming”
constructs
Generic data transformation
actors
Forward-only, abstractable sub-workflow piw(GeneId)
60 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Promoter Identification Workflow Promoter Identification Workflow RedesignedRedesigned
map(GenbankWS) Input: {“NM_001924”, “NM020375”} Output: {“CAGT…AATATGAC",“GGGGA…CAAAGA“}
61 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
A Research Problem: A Research Problem: Optimization by RewritingOptimization by Rewriting
• Example: PIW as a declarative, Example: PIW as a declarative, referentially transparent functional referentially transparent functional processprocess optimization via functional rewriting
possiblee.g. map(f o g) = map(f) o map(g)
• Technical report & PIW specification in Technical report & PIW specification in HaskellHaskell
map(f o g) instead of map(f) o
map(g)
Combination of map and zip
http://kbis.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdfhttp://kbis.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdf
62 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
A KR+DI+Scientific Workflow A KR+DI+Scientific Workflow ProblemProblem
• Services can be Services can be semantically compatiblesemantically compatible, , but but structurally incompatiblestructurally incompatible
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Incompatible
Compatible
(⋠)
(⊑)
(Ps)(Ps) (≺)
Ontologies (OWL)Ontologies (OWL)
Source: [Bowers-Ludaescher, DILS’04]Source: [Bowers-Ludaescher, DILS’04]
63 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Ontology-Informed Data Ontology-Informed Data Transformation Transformation (“Structure-Shim”)(“Structure-Shim”)
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Compatible ( )⊑
RegistrationMapping (Output)
RegistrationMapping (Input)
CorrespondenceCorrespondence
Generate (Ps)(Ps)
Ontologies (OWL)Ontologies (OWL)
Transformation
Source: [Bowers-Ludaescher, DILS’04]Source: [Bowers-Ludaescher, DILS’04]
64 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
OutlineOutline
• Introduction: CI Sample ArchitecturesIntroduction: CI Sample Architectures
• Scientific Data IntegrationScientific Data Integration
• Scientific Workflow ManagementScientific Workflow Management
• Links & Crystallization PointsLinks & Crystallization Points
• Lessons learnt & SummaryLessons learnt & Summary
65 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Link-Up & Crystallization PointsLink-Up & Crystallization Points• Shared (Domain) Science Vision, GoalsShared (Domain) Science Vision, Goals
– NVO, SCEC, Human Genome Project, … • Technology WavesTechnology Waves
– XML, web services, WSRF, Semantic Web (OWL), Portlets, … • Standards for data exchange, metadata, data access Standards for data exchange, metadata, data access
protocols, … protocols, … – GML, EML, netCDF, HDF, …, ADN, …, DODS/OpenDAP, … – Organizations: W3C, GGF, …,
• Community ontologies Community ontologies – GO (Gene Ontology), ecoinformatics, seismology,
geochemistry, …– … from Saulus to Paulus …
• Shared Community Tools and Tool Co-DevelopmentShared Community Tools and Tool Co-Development– SRB, Globus, …, Kepler, …
66 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Shared Science Vision, Goals: SCEC/CMEShared Science Vision, Goals: SCEC/CME
Simulation of Seismic Wave Simulation of Seismic Wave Propagation of a Magnitude Propagation of a Magnitude 7.7 Earthquake on San 7.7 Earthquake on San Andreas FaultAndreas Fault– PIs: Thomas Jordan, Bernard
Minster, Reagan Moore, Carl Kesselman
– Simulation• 240 Processors for 5 days• 47 Terabytes of data
generated
– SDSC SAC project optimized code on DataStar parallel computer (both MPI I/O management and checkpointing)
– Future simulation – Increase resolution a factor of 2, implies 1 PB of simulation results, 1000 processors for 20 days
Southern California Earthquake Center / Community Modeling Southern California Earthquake Center / Community Modeling Environment ProjectEnvironment Project
Source: Reagan Moore, SDSCSource: Reagan Moore, SDSC
67 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Example: NVO Community Example: NVO Community ProcessesProcesses
- created standard data encoding format (FITS image - created standard data encoding format (FITS image format)format)
- made accessible common digital holdings (sky survey - made accessible common digital holdings (sky survey images)images)
- defined Uniform Content Descriptors (common metadata - defined Uniform Content Descriptors (common metadata attributes)attributes)
- created standard services (standard access mechanisms - created standard services (standard access mechanisms to catalogs to catalogs
and surveys)and surveys)- created digital library (manage derived data products)- created digital library (manage derived data products)- created portals (for combining services interactively)- created portals (for combining services interactively)- created processing pipelines (for automated processing)- created processing pipelines (for automated processing)- created preservation environment- created preservation environment
• Broader impact: found a new star!Broader impact: found a new star!Source: Reagan Moore, SDSCSource: Reagan Moore, SDSC
68 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Semantic Mediation “Waterfall” … Semantic Mediation “Waterfall” …
Ontologies
Semantic Data,Service Annotation
IterativeDevelopment
Resource Discovery
ResourceIntegration
WorkflowPlanning
WorkflowAnalysis
Source: Shawn Bowers, SEEK AHM’04Source: Shawn Bowers, SEEK AHM’04
69 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
GEON Dataset Generation & GEON Dataset Generation & RegistrationRegistration
(a co-development in KEPLER)(a co-development in KEPLER)
Xiaowen (SDM)
Edward et al.(Ptolemy)
Yang (Ptolemy)
Efrat(GEON)
Ilkay(SDM)
SQL database access (JDBC)Matt,Chad,
Dan et al. (SEEK)
% Makefile$> ant run
% Makefile$> ant run
70 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
KEPLER as a Melting Pot …KEPLER as a Melting Pot …
• A grass-roots projectA grass-roots project– Needed a coalition of the (really!) willing
• Inter-project linksInter-project links– SEEK ITR, GEON ITR, ROADNet ITRs, DOE SciDAC SDM,
Ptolemy II, NIH BIRN (coming …), UK eScience myGrid, …
• Intra-project linksIntra-project links– e.g. in SEEK: AMS SMS EcoGrid
• Inter-technology linksInter-technology links– Globus, SRB, JDBC, web services, soaplab services,
command line tools, R, GRASS, XSLT, …
• Interdisciplinary linksInterdisciplinary links– CS, IT, domain sciences, … (recently: usability
engineer)
71 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
OutlineOutline
• Introduction: CI Sample ArchitecturesIntroduction: CI Sample Architectures
• Scientific Data IntegrationScientific Data Integration
• Scientific Workflow ManagementScientific Workflow Management
• Links & Crystallization PointsLinks & Crystallization Points
• Lessons learnt & SummaryLessons learnt & Summary
72 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Some Lessons Learnt Some Lessons Learnt • Eat your own dog-food (or at least try…)Eat your own dog-food (or at least try…)
– start using your own (CI) tools early
• Collaboration toolsCollaboration tools– CVS repositories (+cvsview, webcvs)– Mailing lists (e.g. mailman googlified) – Bugzilla (detailed tracking of tech. issues & bugs)– Wiki (community authored web resource, e.g. high-level tech.
issues)
• Where is the XYZ repository/registry?Where is the XYZ repository/registry?– EcoGrid (SEEK) registry, GEON registry, KEPLER actor &
datasets repository, …– UDDI what?
• CI Melting Pots: SDSC, …CI Melting Pots: SDSC, …– NCEAS, LTER, NLADR (w/ NCSA), KU Specify, …
73 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Q & AQ & A
74 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Further ReadingFurther Reading
75 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Related PublicationsRelated Publications• Semantic Data Registration and IntegrationSemantic Data Registration and Integration• On Integrating Scientific Resources through Semantic RegistrationOn Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, , S. Bowers, K. Lin,
and B. Ludäscher, and B. Ludäscher, 16th International Conference on Scientific and Statistical Database 16th International Conference on Scientific and Statistical Database ManagementManagement ( (SSDBM'04SSDBM'04), 21-23 June 2004, Santorini Island, Greece. ), 21-23 June 2004, Santorini Island, Greece.
• A System for Semantic Integration of Geologic Maps via A System for Semantic Integration of Geologic Maps via OntologiesOntologies, K. Lin and B. , K. Lin and B. Ludäscher. In Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific DataSemantic Web Technologies for Searching and Retrieving Scientific Data ( (SCISWSCISW), Sanibel Island, Florida, 2003. ), Sanibel Island, Florida, 2003.
• Towards a Generic Framework for Semantic Registration of Scientific DataTowards a Generic Framework for Semantic Registration of Scientific Data, S. Bowers , S. Bowers and B. Ludäscher. In and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Semantic Web Technologies for Searching and Retrieving Scientific DataScientific Data ( (SCISWSCISW), Sanibel Island, Florida, 2003. ), Sanibel Island, Florida, 2003.
• The Role of XML in Mediated Data Integration Systems with Examples from Geological (MThe Role of XML in Mediated Data Integration Systems with Examples from Geological (Map) Data Interoperabilityap) Data Interoperability, B. Brodaric, B. Ludäscher, and K. Lin. In , B. Brodaric, B. Ludäscher, and K. Lin. In Geological Society of America (GSA) Geological Society of America (GSA) Annual MeetingAnnual Meeting, volume 35(6), November 2003. , volume 35(6), November 2003.
• Semantic Mediation Services in Geologic Data Integration: A Case Study from the GEON GSemantic Mediation Services in Geologic Data Integration: A Case Study from the GEON Gridrid, K. Lin, B. Ludäscher, B. Brodaric, D. Seber, C. Baru, and K. A. Sinha. In , K. Lin, B. Ludäscher, B. Brodaric, D. Seber, C. Baru, and K. A. Sinha. In Geological Society of America (GSA) Annual MeetingGeological Society of America (GSA) Annual Meeting, volume 35(6), November 2003. , volume 35(6), November 2003.
• Query Planning and RewritingQuery Planning and Rewriting• Processing First-Order Queries under Limited Access PatternsProcessing First-Order Queries under Limited Access Patterns, Alan Nash and B. , Alan Nash and B.
Ludäscher, Ludäscher, Proc. 23rd ACM Symposium on Principles of Database SystemsProc. 23rd ACM Symposium on Principles of Database Systems ( (PODS'04PODS'04) ) Paris, France, June 2004. Paris, France, June 2004.
• Processing Unions of Conjunctive Queries with Negation under Limited Access PatternsProcessing Unions of Conjunctive Queries with Negation under Limited Access Patterns, , Alan Nash and B. Ludäscher., Alan Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology9th Intl. Conference on Extending Database Technology ((EDBT'04EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS 2992. ) Heraklion, Crete, Greece, March 2004, LNCS 2992.
• Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negationwith Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), , B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on 20th Intl. Conference on Data EngineeringData Engineering ( (ICDE'04ICDE'04) Boston, IEEE Computer Society, April 2004.) Boston, IEEE Computer Society, April 2004.
76 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004 Scientific Data & WF Engineering, B.Ludäscher
Nov. 15th 2004
Related PublicationsRelated Publications• Scientific WorkflowsScientific Workflows• KeplerKepler: An Extensible System for Design and Execution of Scientific Workflows: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C. , I. Altintas, C.
Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on 16th International Conference on Scientific and Statistical Database ManagementScientific and Statistical Database Management ( (SSDBM'04SSDBM'04), 21-23 June 2004, Santorini ), 21-23 June 2004, Santorini Island, Greece. Island, Greece.
• KeplerKepler: Towards a Grid-Enabled System for Scientific Workflows: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad , Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher, Steve Mock, Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher, Steve Mock, Workflow in Grid Systems (GGF10)Workflow in Grid Systems (GGF10), Berlin, March 9th, 2004., Berlin, March 9th, 2004.
• An Ontology-Driven Framework for Data Transformation in Scientific WorkflowsAn Ontology-Driven Framework for Data Transformation in Scientific Workflows, S. Bowers , S. Bowers and B. Ludäscher, and B. Ludäscher, Intl. Workshop on Data Integration in the Life SciencesIntl. Workshop on Data Integration in the Life Sciences ( (DILS'04DILS'04), March ), March 25-26, 2004 Leipzig, Germany, LNCS 2994. 25-26, 2004 Leipzig, Germany, LNCS 2994.
• A Web Service Composition and Deployment Framework for Scientific Workflows, I. A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In the Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In the 2nd Intl. Conference on 2nd Intl. Conference on Web ServicesWeb Services ( (ICWSICWS), San Diego, California, July 2004.), San Diego, California, July 2004.