Workflow Systems for Science: Concepts & Tools
Domenico TaliaICAR-CNR and University of Calabria, 87036 Rende, Italy
by: Seyed Ziae Mousavi MojabWayne State University - 2013
● Introduction
● Main programming issues in the area of scientific workflow
● Significant WMS
● Issues that are still open in the area of scientific workflows
● Conclusion
Agenda
"workflows provide a declarative way of specifying the high-level logic of an application, hiding the low-level details that
are not fundamental for application design."
Introduction
"A workflow is a well-defined, and possibly repeatable, pattern or systematic organization of activities designed to
achieve a certain transformation of data."
Introduction
● Scientific Workflow:
Introduction
Introduction
Taverna, Pegasus, Triana, Askalon, Kepler, GWES, and Karajan, ...
● Scientific Workflows & open research issues
Workflow Programming
● Textual programming interface
● Visual programming interface
Workflow Programming
- Directed Acyclic Graph (DAG)
- Directed Cyclic Graph (DCG)
● Programming Structure:
Workflow Programming
- Efficiency: bind tasks to appropriate computing resource
- Robustness: detecting and recovering from failure-- monitoring-- checkpoints-- step by step execution
● Workflow Enactment:
Workflow Programming
- Abstract Level: what has to be done at each task along with information about how tasks are interconnected
- Concrete Level: the implementation and/or resources to be used
● Workflow Design:
Scientific Workflow Management Systems
Workflow Management Systems: are software environments providing tools to define, compose, map, and execute workflows
- Programming languages: BPEL, UML, Petri nets, XML-based, …
Scientific Workflow Management Systems
- Script-like Systems: Grid Ant, Karajan,...
- Graphical-based Systems
● Workflow Design:
Scientific Workflow Management Systems
- Java based, Open source, developed at the University of Manchester
- To support life sciences (design and execution of scientific workflows)
- Can invoke any web service through WSDL (code reusability)
- Can also invoke local java services, api … and import data from CSV or Excel Spreadsheet
● Taverna:
Scientific Workflow Management Systems
i. Taverna Engineii. Taverna Workbenchiii. Taverna Serveriv. A command line tool
● Taverna Tools:
Scientific Workflow Management Systems
Scientific Workflow Management Systems
i. Pipeliningii. Implicit iteration of service callsiii. Conditional calling of servicesiv. Customizable looping over a servicev. Failover and retry of service callingvi. Parallel executionvii. Managing previous runs and workflow results
● Features of Taverna Workflow:
Scientific Workflow Management Systems
- Java based, Open source, developed at Cardiff University- Modularized architecture- Combines a visual interface + data analysis tools- Can connect heterogeneous tools (Web services, Java units,...)- Uses its own custom workflow language (+BPEL)- Uses several workflow patterns including loop and branches
● Triana:
Scientific Workflow Management Systems
- Signal analysis- Image manipulation- Desktop publishing- Also to integrate your own tools
● Triana Tools:
Scientific Workflow Management Systems
Scientific Workflow Management Systems
- Developed at the university of Southern California- Runs on desktops, clusters, grids, clouds- Used in several scientific areas including bioinformatics, astronomy, earthquake science, gravitational wave physics, and ocean science.- Executes the workflow tasks in the order of their dependencies- Includes a sophisticated error recovery system
● Pegasus:
Scientific Workflow Management Systems
i. The Mapper:- builds an executable workflow based on an abstract
workflow provided by the user- can also restructure the workflow for optimization purpose
ii. The Execution Engine:- executes the tasks in appropriate order
iii. The Task Manager:- managing and supervising workflow tasks on the local or
remote resources
● Pegasus Components:
Scientific Workflow Management Systems
Scientific Workflow Management Systems
- Java based, open source, developed at the University of California- Can execute workflows from graphical interface or command line- Based on the concept of directors- Runs on local and Grids- Supports foreign language interface through JNI (Matlab actor, Python actor...)- Supports distributed computational resources through Web and Grid service actor- Used to design and execute various workflows in biology, ecology, geology, chemistry, and astrophysics
● Kepler:
Scientific Workflow Management Systems
Scientific Workflow Management Systems
- Developed at the University of Innsbruck- Allows the execution of distributed workflow applications in service oriented Grids- Uses Globus Toolkit as Grid Middleware- Uses a custom XML based language (AGWL)
● Askalon:
Scientific Workflow Management Systems
i. Resource Brokerii. Resource Monitoringiii. Information Serviceiv. Workflow Executorv. Metascheulervi. Performance Predictionvii. Performance Analysis
● Askalon Architecture:
Scientific Workflow Management Systems
Scientific Workflow Management Systems
- Java based, open source data mining systems- Offers easy GUI interface- Includes Knowledge Flow tool- Data mining algorithms are wrapped as web services- Executes a whole workflow only on a single computer- Can use Gridlab to exploit Grid resources- Provides data & task parallelism
● Weka4WS:
Scientific Workflow Management Systems
Scientific Workflow Management Systems
- Multi level abstraction- Plugin concept (DB2 activity, Grid activity, ...)- Can be used on clusters, grids, clouds- Uses GworkflowDL based on Petri nets- Supports exception handling
● GWES (Generic Workflow Execution Service):
Scientific Workflow Management Systems
Scientific Workflow Management Systems
- Utilizes reference nets for composing workflow tasks in hierarchical way- Has forwarder-receiver components- Maps between tasks and resources
● DVega:
Scientific Workflow Management Systems
- Java based- Allows users to compose workflows through XML scripting language & K- Supports linear and parallel execution- Supports hierarchical workflow- Allows monitoring of the execution (checkpointing subsystem)- Workflows can be modified during the runtime
● Karajan:
Scientific Workflow Management Systems
Scientific Workflow Management Systems
- allows user to compose distributed data mining workflow - execute workflows onto the Knowledge Grid- visualize the result
Functionalities:i. Metadata managementii. Design and execution management
● DIS3GNO:
Scientific Workflow Management Systems
Scientific Workflow Management Systems
Discussion and Research issues
- Abstractions for data representation
- Abstractions for concurrent processing orchestration
- Annotating, storing and retrieving workflow results
● Workflow formalisms:
Discussion and Research issues
i. Textual or graphical composition
ii. Mapping of the abstract workflow description onto the available resources
iii. Scheduling, monitoring, and debugging of subsequent execution
● Workflow Lifecycle:
Discussion and Research issues
i. Adaptive Workflow Execution Modelsii. High level tools and languages for workflow compositioniii. Scientific workflow Interoperability and Opennessiv. Big Data management and knowledge discovery workflowsv. Internet-wide distributed workflow executionvi. Service-oriented workflows on Cloud infrastructuresvii. Workflows composition and execution in Exascale computing systemsviii. Fault-tolerance and recovery strategies for scientific workflowsxi. Workflow provenance and annotation mechanisms and systems
● Topics to investigate:
Conclusion
- Support scientific processes
- Integrate programs, methods, agents and services
- Helps knowledge discovery from Big Data
- Needs to deal with failures
● Workflow Systems:
Thank You!