Upload
marvene-grealish
View
43
Download
0
Embed Size (px)
DESCRIPTION
Scientific Workflows in e-Science. Dr Zhiming Zhao ( [email protected] ) System and Network Engineering, University of Amsterdam Virtual Laboratory for e-Science. Outline. Background Scientific workflow management system Virtual Laboratory for e-Science Our approach - PowerPoint PPT Presentation
Citation preview
August 31 2006, Elsevier, Amsterdam
Scientific Workflows in e-Science
Dr Zhiming Zhao([email protected])
System and Network Engineering, University of AmsterdamVirtual Laboratory for e-Science
August 31 2006, Elsevier, Amsterdam
Outline
Background Scientific workflow management system Virtual Laboratory for e-Science Our approach Challenges and research lines Activities
August 31 2006, Elsevier, Amsterdam
Problem solving: a typical scenario in scientific research
• Analysis
• Hypothesis
• Related work
• Propose experiments
• Define steps
• Prototype computing systems
• Perform experiments
• Data collection
• Visualization
• Validation
• Adjust experiment
• Refine hypothesis
• Presentation
• Dissemination
Define problems Experiments Data analysis Discovery
Activities are:
- Iterative, dynamic, and human centered
- Requires different levels of resources
August 31 2006, Elsevier, Amsterdam
Example scenarios
In problem analysis Identify domains, search key problems, find typical methods, and
review related work In scientific experiments: scientific computing & data processing
Define dependencies between computing and data processing tasks, and schedule their runtime behavior
In data analysis Visualization, compare the results of different parameters, keep
meaningful configuration and continue experiments Search related work, compare results
In dissemination Documenting experiments, present results, citation, publication
August 31 2006, Elsevier, Amsterdam
Computer support for problem solving
Problem Solving Environment: (E Gallopoulos et. al., IEEE CS Eng. 1994)• Organize different software components/ tools• Allows a user to assemble these tools at a high level of abstraction• Control runtime behavior of experiments• Examples: MATLab, Ptolemy, etc.
Traditional PSE: organize and execute resources locally!
Distributed resources
DistributedParallel
computing
Visualization,Remote resource
invocation
Distributed data sharing & dissemination
Scientific workflow management systems:A new guise of PSE!
August 31 2006, Elsevier, Amsterdam
Inside a Scientific Workflow Management System
In our view, a SWMS at least implements:
A model for describing workflows;
An engine for executing/managing workflows;
Different levels of support for a user to compose, execute and control a workflow.
Workflow (based on certain model)
Engine
User su
pp
ort
resources
Composition
Engine level control
Resource level control
A SWMS
August 31 2006, Elsevier, Amsterdam
Scientific Workflows in e-Science
Workflows varies at different
Phases of experiments: design, runtime control, dissemination;
Abstractions of resources: concrete and abstract;
Levels of activity details: computing, data access, search/matching, human activities;
…
Experiment processes
Abstract workflows
Executable (concrete workflows)
Wo
rkflow
s for ad
min
istration
, e.g.,
AA
A, an
d o
ther issu
es.
August 31 2006, Elsevier, Amsterdam
Diversity in SWMSTaverna:
-Web services based language: Scufl;
-FreeFluo: engine
-Graphical viz of workflow
Kepler:
-Actor,director
-MoML
-Execution models
Triana:
-Components
-Task graph
-Data/control flow
DAGMan:
-Computing tasks
-DAG
Pegasus:
-Based on DAGMan
-VDL
-DAG
…
August 31 2006, Elsevier, Amsterdam
Virtual Laboratory for e-Science
Dutc
h te
lesc
ienc
e
Data
inte
nsive
scie
nce
Med
ical
diag
nosis
Generic e-science framework layer
Application layer
Bio
info
rmat
ics
ASP
Bio
dive
rsity
Food
Info
rmat
ics
Grid layer
August 31 2006, Elsevier, Amsterdam
MissionEffectively reuse existing workflow managements
systems, and provide a generic e-Science framework for different application domains.
A generic framework can Improve the reuse of workflow components and the
workflows for different experiments Reduce the learning cost for different systems Allow application users to work on a consistent
environment when underlying infrastructure changed
August 31 2006, Elsevier, Amsterdam
Previous work: VLAM-G environment VLAM-G
A Grid enable PSE Data intensive
applicationsVisual interfaceTwo levels of workflow
supportHuman interaction
support
August 31 2006, Elsevier, Amsterdam
Workflow in VLAMG
August 31 2006, Elsevier, Amsterdam
Experiment Topology– Graphical representation of self-contained data
processing modules attached to each other in a workflow.
hasExperiments(NOREUSE)
hasSteps(NOREUSE)
PROJECT(LINK)
EXPERIMENT(COPY)
COMMENT(COPY)
hasComments(COPY)
OWNER(LINK)
hasOwnerLINK
CONTRIBUTOR(LINK)
isPartOfProject(NOREUSE)
ownsExperiments(NOREUSE)
hasContributors(LINK)
contributedExperiments(NOREUSE)
EXPERIMENT(LINK)
hasNextExperiment(NOREUSE)
hasPrevExperiment(NOREUSE)
isPartOfExperiment(NOREUSE)
COMMENTATOR(LINK)
isMadeBy(LINK)
ARRAYMEASUREMENT
(COPY)
COMMENT(COPY)
hasComments(COPY)
PROPERTY(COPY)
hasProperties(COPY)
OWNER(LINK)
isPerformedBy(LINK)
hasPerformed(NOREUSE)
COMMENTATOR(LINK)
isMadeBy(LINK)
hasNextStep(NOREUSE)
hasPrevStep(NOREUSE)
DATA ANALYSIS(COPY)
hasExperiments(NOREUSE)
hasSteps(NOREUSE)
PROJECT(LINK)
EXPERIMENT(COPY)
COMMENT(COPY)
hasComments(COPY)
OWNER(LINK)
hasOwnerLINK
CONTRIBUTOR(LINK)
isPartOfProject(NOREUSE)
ownsExperiments(NOREUSE)
hasContributors(LINK)
contributedExperiments(NOREUSE)
EXPERIMENT(LINK)
hasNextExperiment(NOREUSE)
hasPrevExperiment(NOREUSE)
isPartOfExperiment(NOREUSE)
COMMENTATOR(LINK)
isMadeBy(LINK)
ARRAYMEASUREMENT
(COPY)
COMMENT(COPY)
hasComments(COPY)
PROPERTY(COPY)
hasProperties(COPY)
OWNER(LINK)
isPerformedBy(LINK)
hasPerformed(NOREUSE)
COMMENTATOR(LINK)
isMadeBy(LINK)
hasNextStep(NOREUSE)
hasPrevStep(NOREUSE)
DATA ANALYSIS(COPY)
Process-Flow Template– Graphical representation of data elements and processing steps in an experimental procedure.
Study– Descriptions of experimental steps represented as an instance of a PFT with references to experiment topologies.
VLAM-G PFT/Study
August 31 2006, Elsevier, Amsterdam
Lessons learned
How to introduce a new PSE to a domain scientist? Because it has a beautiful architecture? Or because it can allow a scientist to keep their
current work style? How to use existing work?
Scientists need one system or more options? How to include user in the computing loop?
Dynamic workflows and human in the loop computing are important.
Z. Zhao et al., “Scientific workflow management: between generality and applicability”, QSIC 2005, Australia
August 31 2006, Elsevier, Amsterdam
Workflow support in VL-e
Recommend suitable workflow systems for different application domains: Analyze typical application use cases Define small projects with different application
domains Review existing workflow systems Recommend four workflow systems: Triana, Taverna,
Kepler, and VLAMG A long term
Extend VLAMG and develop our own generic workflow framework
August 31 2006, Elsevier, Amsterdam
A workflow bus paradigm
Workflow bus
Taverna KeplerTriana
Sub workflow 1
Sub workflow 2
Sub workflow 3
Workflow
A workflow bus is a special workflow system for executing meta workflows, in which sub workflows will be executed by different engines.
Z. Zhao et al., “Workflow bus for e-Science”, in IEEE Int’l Conf. e-Science 2006, Amsterdam
August 31 2006, Elsevier, Amsterdam
Applications of workflow bus
Use case 1: A user has workflow in Taverna Some functionality is missing in Taverna but can be
provided by Triana He can develop the workflow in two systems, and run
it via the workflow bus
Use case 2: A user wants to execute a Taverna or Triana workflow
in multiple instances with different input data
August 31 2006, Elsevier, Amsterdam
Ongoing research
Web service in data intensive applications Execution models for Grid workflows Including PSE in scientific workflows Industrial standards in scientific workflows
August 31 2006, Elsevier, Amsterdam
Relevance between our research and Elsevier’s work In a same context from the scale of entire
lifecycle of e-Science experiments Different focuses
We focus on runtime behavior of scientific experiments, e.g., Grid computing, data/computing intensive applications, and scheduling of computing tasks
Elsevier highlights data search and integration on well structured data bases, research preparation, and literature search and management
August 31 2006, Elsevier, Amsterdam
Cont.
Different characteristics in workflows In our workflows, processing and managing runtime dynamic
data is the key patterns In Elsevier workflows, storage, replicate, access, match and
integrate static data might be more common Facing similar challenges:
Semantics based data search and integration Workflow provenance Collaborative interaction (workflow development, resource
sharing, knowledge transfer) Modeling user profiles
August 31 2006, Elsevier, Amsterdam
Activities Int’l workshop on “Workflow systems in e-Science”, organized by
Zhiming Zhao and Adam Belloum, in the context of ICCS06, Reading University, May 28, 2006. Proceedings is in LNCS, Springer Verlag. A special issue will be published in Scientific Programming Journal. http://staff.science.uva.nl/~zhiming/iccs-wses
Workshop on “Scientific workflows and industrial workflow standards in e-Science ”, organized by Adam Belloum and Zhiming Zhao, in the context of IEEE e-Science and Grid computing conference in Amsterdam December 2006. Pegasus, Dr. Ewa Deelman (Department of Computer Science University of South
California) BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory) Kepler, Dr. Bertram Ludäscher (Department of Computer Science University of
California, Davis) Taverna, Prof. Peter Rice (European Bioinformatics Institute) WS and Semantic issues, Dr. Steve Ross-Talbot (CEO, and a co-founder, of
Pi4 Technologies) Triana, Dr. Ian J. Taylor (Department of Computer Science Cardiff University) http://staff.science.uva.nl/~adam/workshop/VL-e-workshop.htm
August 31 2006, Elsevier, Amsterdam
References
1. Virtual Laboratory for e-Science: www.vl-e.nl2. Network and System Engineering, Faculty of Science, University of Amsterdam:
http://www.science.uva.nl/research/sne/3. Z. Zhao; A. Belloum; H. Yakali; P.M.A. Sloot and L.O. Hertzberger: Dynamic
Workflow in a Grid Enabled Problem Solving Environment, in Proceedings of the 5th International Conference on Computer and Information Technology (CIT2005), pp. 339-345 . IEEE Computer Society Press, Shanghai, China, September 2005.
4. Z. Zhao; A. Belloum; A. Wibisono; F. Terpstra; P.T. de Boer; P.M.A. Sloot and L.O. Hertzberger: Scientific workflow management: between generality and applicability, in Proceedings of the International Workshop on Grid and Peer-to-Peer based Workflows in conjunction with the 5th International Conference on Quality Software, pp. 357-364. IEEE Computer Society Press, Melbourne, Australia , September 19th-21st 2005.
5. Z. Zhao; A. Belloum; P.M.A. Sloot and L.O. Hertzberger: Agent technology and scientific workflow management in an e-Science environment, in Proceedings of the 17th IEEE International conference on Tools with Artificial Intelligence (ICTAI05), pp. 19-23. IEEE Computer Society Press, Hongkong, China, November 14th-16th 2005.