Upload
stian-soiland-reyes
View
1.401
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Presentation of Taverna from UKOLN DevSci "Workflow Tools" event in Bath, 2010-11-30 PPTX version: http://www.slideshare.net/soilandreyes/taverna-workflow-management-system-2010-1130-bath-workflow-tools-pptx http://taverna.org.uk/ http://www.ukoln.ac.uk/events/devcsi/workflow_tools/programme/index.html http://devcsi.ukoln.ac.uk/
Citation preview
Stian Soiland-Reyes
myGrid, School of Computer Science
University of Manchester, UK
UKOLN DevSci: Workflow Tools Bath, 2010-11-30 Grid
my
http://taverna.org.uk/
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
What is myGrid?
An e-Science Collaboration Since 2001 Not a grid! Numerous partners involved:
University of Manchester University of Southampton University of Oxford EMBL-EBI
Provides sustainable and production quality software Supported by OMII-UK, EPSRC and BBSRC
Mixture of developers, bioinformaticians and researchers
Software | Services | Content | Skills | Community
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Motivation
Challenge: Bioinformatics
Large amounts of data
Many open questions
Numerous freely available public datasets and analysis tools
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Huge amounts of data
100+ Genes
QTL regions
Microarray
1000+ Genes
Next Gen Sequencing
10,000+ Genes
How do I look at all the genes systematically?
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Manual approach
Search using public web sites and databases Pubmed Uniprot EBI BioMart
Copy and paste to web tools for analysis NCBI Blast EBI InterPro
Further processing locally R Perl Python
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Manual: disadvantages
• Scale of analysis task overwhelms researchers – lots of data
• User bias and premature filtering of datasets – cherry picking
• Hypothesis-Driven approach to data analysis • Constant changes in data - problems with re-
analysis of data • Implicit methodologies (hyper-linking through
web pages) • Error proliferation from any of the listed issues
– notably human error
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Web services and workflows
Web services Technology and standards for exposing code and
data resources that can be programmatically consumed by a remote third party
Description on how to interact with the service, parameters, documentation
Workflows General technique for describing and executing a
process
Describe what you want to do running which services
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Taverna workflows
A set of (local and remote) services to analyze or manage data
Nested workflows are also services
Data-links connects services i.e. output from service A is input to
service B and C Describes the desired dataflow
instead of process coordination
Automatic iterations Can customize list handling and
control links
Get_pathways
Workflow Inputs
Workflow Outputs
Workflow Inputs
Workflow Outputs
remove_uniprot_duplicates
merge_uniprot_ids
species
getcurrentdatabase
kegg_pathway_release
binfo
regex_2
split_for_duplicates
split_for_duplicate_pathways
remove_duplicate_kegg_genes
merge_genes_and_pathways_3
flatten_pathway_files
merged_pathways
merge_genes_and_pathways
merge_genes_and_pathways_2
merge_kegg_references
kegg_external_gene_reference
remove_pathway_duplicates
merge_pathway_desc
merge_pathway_list_1
merge_pathway_list_2
remove_duplicate_ids
merge_patwhay_ids
pathway_descriptions
merge_reports
report
merge_gene_desc
remove_nulls_3
gene_descriptions
gene_ids
REMOVE_NULLS_2
remove_entrez_duplicates
merge_entrez_genes
remove_pathway_nulls
remove_Nulls
concat_kegg_genes
split_gene_ids
remove_pathway_nulls_2
add_uniprot_to_string
gene_descriptions pathway_descriptions
add_ncbi_to_string
Kegg_gene_ids_2
pathway_ids
Kegg_gene_ids
genes_in_qtl
mmusculus_gene_ensembl
create_report
ensembl_database_releasegenes_pathways kegg_pathway_release
Merge_pathways
concat_ids
pathway_ids
regex
split_by_regex
lister
Merge_gene_pathways
pathway_genes
concat_gene_pathway_ids
get_pathways_by_genes1
chromosome_namestart_position end_position
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
What types of services?
Public/private/secured WSDL/SOAP web services RESTful web services Spreadsheet import Command line tools (local/ssh) Inline scripts (Beanshell, R) Java APIs Customizations:
BioMart, BioMoby / SADI Soaplab Grid services (Globus, EGEE gLite, caGrid) … your tool (Plugin tutorial on wiki)
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Which services?
Taverna is general, can connect to standard web services for any domain
Bioinformatics:
From professional third-party organisations providing robust & open data/analysis services
..to under-the-desk web services for one particular purpose, ran by PhD students
http://biocatalogue.org/ - 1730 services from 130 providers – crowd sourced and quality monitored
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Taverna workbench
Graphical desktop tool
No server installation required
Drag-and-drop services into diagram
Connect services, run, reconnect, rerun
Integrates diverse set of tools
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Sharing workflows
myExperiment.org allows users to share, find, download and rate workflows
“Facebook for the scientist”
3000 members, 1100 workflows
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Extensible UI and engine
Plugins can provide new “perspectives”
i.e.: BioCatalogue, myExperiment
Provide service-specific customization
BioMart interface replicates web site
Adding new functionality
Looping, branching, dynamic service resolution
New service types
Design helpers, “Find matching service”
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Taverna 3 “Next-gen”
Under development for 2011
Interactive, component-centric and data-centric workflow design
Pre-packaged workflow components
Searching for workflow components from BioCatalogue and myExperiment
New myGrid workflow components library
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Taverna command line
Executes from a Windows/Linux/OSX shells
Takes a predefined workflow with files as inputs and outputs
Quick way to “productionize” a workflow
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Taverna Server
REST/SOAP interface to execute workflows
Client libraries for Ruby and Java
Two demonstration web interfaces Ruby
Java Portlets
Future Detailed execution support and control
Security delegation
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Taverna portlet
Example portlet implementation
Executes workflows using Taverna Server
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Ruby web interface
Example customized web interface
Uses Ruby gem t2-server
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Taverna on the cloud
Use-case: SNP analysis and annotation of
genome sequenced from breeds of cows in Africa – why are some of them resistent to X?
Amazon EC2 with Taverna Server and local services
Custom (built-in-a-week) Ruby on Rails web interface
Runs through 31 chromosomes in 6.5 hours using 10 instances - $26
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Open source, open development
Taverna suite of tools are all open source and free to use
Large user community, active mailing lists
Lead developers: myGrid in Manchester
Contributors from across the world
PAL programme
myGrid provides training, tutorials and documentation
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Acknowledgements
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
Gridmy
http://taverna.org.uk/ http://mygrid.org.uk/
More information
http://www.mygrid.org.uk/
http://www.taverna.org.uk/
http://www.myexperiment.org/
http://www.biocatalogue.org/