29
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath, 2010-11-30 Grid my http://taverna.org.uk/

Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Embed Size (px)

DESCRIPTION

Presentation of Taverna from UKOLN DevSci "Workflow Tools" event in Bath, 2010-11-30 PPTX version: http://www.slideshare.net/soilandreyes/taverna-workflow-management-system-2010-1130-bath-workflow-tools-pptx http://taverna.org.uk/ http://www.ukoln.ac.uk/events/devcsi/workflow_tools/programme/index.html http://devcsi.ukoln.ac.uk/

Citation preview

Page 1: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Stian Soiland-Reyes

myGrid, School of Computer Science

University of Manchester, UK

UKOLN DevSci: Workflow Tools Bath, 2010-11-30 Grid

my

http://taverna.org.uk/

Page 2: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

What is myGrid?

An e-Science Collaboration Since 2001 Not a grid! Numerous partners involved:

University of Manchester University of Southampton University of Oxford EMBL-EBI

Provides sustainable and production quality software Supported by OMII-UK, EPSRC and BBSRC

Mixture of developers, bioinformaticians and researchers

Software | Services | Content | Skills | Community

Page 3: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Motivation

Challenge: Bioinformatics

Large amounts of data

Many open questions

Numerous freely available public datasets and analysis tools

Page 4: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Huge amounts of data

100+ Genes

QTL regions

Microarray

1000+ Genes

Next Gen Sequencing

10,000+ Genes

How do I look at all the genes systematically?

Page 5: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Manual approach

Search using public web sites and databases Pubmed Uniprot EBI BioMart

Copy and paste to web tools for analysis NCBI Blast EBI InterPro

Further processing locally R Perl Python

Page 6: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Manual: disadvantages

• Scale of analysis task overwhelms researchers – lots of data

• User bias and premature filtering of datasets – cherry picking

• Hypothesis-Driven approach to data analysis • Constant changes in data - problems with re-

analysis of data • Implicit methodologies (hyper-linking through

web pages) • Error proliferation from any of the listed issues

– notably human error

Page 7: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Web services and workflows

Web services Technology and standards for exposing code and

data resources that can be programmatically consumed by a remote third party

Description on how to interact with the service, parameters, documentation

Workflows General technique for describing and executing a

process

Describe what you want to do running which services

Page 8: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Taverna workflows

A set of (local and remote) services to analyze or manage data

Nested workflows are also services

Data-links connects services i.e. output from service A is input to

service B and C Describes the desired dataflow

instead of process coordination

Automatic iterations Can customize list handling and

control links

Get_pathways

Workflow Inputs

Workflow Outputs

Workflow Inputs

Workflow Outputs

remove_uniprot_duplicates

merge_uniprot_ids

species

getcurrentdatabase

kegg_pathway_release

binfo

regex_2

split_for_duplicates

split_for_duplicate_pathways

remove_duplicate_kegg_genes

merge_genes_and_pathways_3

flatten_pathway_files

merged_pathways

merge_genes_and_pathways

merge_genes_and_pathways_2

merge_kegg_references

kegg_external_gene_reference

remove_pathway_duplicates

merge_pathway_desc

merge_pathway_list_1

merge_pathway_list_2

remove_duplicate_ids

merge_patwhay_ids

pathway_descriptions

merge_reports

report

merge_gene_desc

remove_nulls_3

gene_descriptions

gene_ids

REMOVE_NULLS_2

remove_entrez_duplicates

merge_entrez_genes

remove_pathway_nulls

remove_Nulls

concat_kegg_genes

split_gene_ids

remove_pathway_nulls_2

add_uniprot_to_string

gene_descriptions pathway_descriptions

add_ncbi_to_string

Kegg_gene_ids_2

pathway_ids

Kegg_gene_ids

genes_in_qtl

mmusculus_gene_ensembl

create_report

ensembl_database_releasegenes_pathways kegg_pathway_release

Merge_pathways

concat_ids

pathway_ids

regex

split_by_regex

lister

Merge_gene_pathways

pathway_genes

concat_gene_pathway_ids

get_pathways_by_genes1

chromosome_namestart_position end_position

Page 9: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

What types of services?

Public/private/secured WSDL/SOAP web services RESTful web services Spreadsheet import Command line tools (local/ssh) Inline scripts (Beanshell, R) Java APIs Customizations:

BioMart, BioMoby / SADI Soaplab Grid services (Globus, EGEE gLite, caGrid) … your tool (Plugin tutorial on wiki)

Page 10: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Which services?

Taverna is general, can connect to standard web services for any domain

Bioinformatics:

From professional third-party organisations providing robust & open data/analysis services

..to under-the-desk web services for one particular purpose, ran by PhD students

http://biocatalogue.org/ - 1730 services from 130 providers – crowd sourced and quality monitored

Page 11: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Page 12: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Taverna workbench

Graphical desktop tool

No server installation required

Drag-and-drop services into diagram

Connect services, run, reconnect, rerun

Integrates diverse set of tools

Page 13: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Page 14: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Page 15: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Page 16: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Sharing workflows

myExperiment.org allows users to share, find, download and rate workflows

“Facebook for the scientist”

3000 members, 1100 workflows

Page 17: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Extensible UI and engine

Plugins can provide new “perspectives”

i.e.: BioCatalogue, myExperiment

Provide service-specific customization

BioMart interface replicates web site

Adding new functionality

Looping, branching, dynamic service resolution

New service types

Design helpers, “Find matching service”

Page 18: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Taverna 3 “Next-gen”

Under development for 2011

Interactive, component-centric and data-centric workflow design

Pre-packaged workflow components

Searching for workflow components from BioCatalogue and myExperiment

New myGrid workflow components library

Page 19: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Taverna command line

Executes from a Windows/Linux/OSX shells

Takes a predefined workflow with files as inputs and outputs

Quick way to “productionize” a workflow

Page 20: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Taverna Server

REST/SOAP interface to execute workflows

Client libraries for Ruby and Java

Two demonstration web interfaces Ruby

Java Portlets

Future Detailed execution support and control

Security delegation

Page 21: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Taverna portlet

Example portlet implementation

Executes workflows using Taverna Server

Page 22: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Page 23: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Ruby web interface

Example customized web interface

Uses Ruby gem t2-server

Page 24: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Taverna on the cloud

Use-case: SNP analysis and annotation of

genome sequenced from breeds of cows in Africa – why are some of them resistent to X?

Amazon EC2 with Taverna Server and local services

Custom (built-in-a-week) Ruby on Rails web interface

Runs through 31 chromosomes in 6.5 hours using 10 instances - $26

Page 25: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Page 26: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Open source, open development

Taverna suite of tools are all open source and free to use

Large user community, active mailing lists

Lead developers: myGrid in Manchester

Contributors from across the world

PAL programme

myGrid provides training, tutorials and documentation

Page 27: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Acknowledgements

Page 28: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

Page 29: Taverna workflow management system (2010 11-30 Bath Workflow Tools)

Gridmy

http://taverna.org.uk/ http://mygrid.org.uk/

More information

http://www.mygrid.org.uk/

http://www.taverna.org.uk/

http://www.myexperiment.org/

http://www.biocatalogue.org/