Upload
abner-tyler
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Social collaboration environments for sharing, curating and cataloguing personal, group and community contributed scientific assets. BSD5000+ registered users, 56 countries1600+ workflows, 1700+ services
Scientific workflow management system for accessing open, public data services, assembling data processing and analysis pipelines and recording provenance. LGPL361 organisation, 48 countries70,000+ binary downloads , ~4000 source
http://www.mygrid.org.uk
Handy tools for data management tasks in bioinformatics. BSD
Scientific workflows, scripts and pipelinesNow also neuroscience, music and numerical analysisDeveloped with Oxford and Southampton
Web-based Software & Sharing Services“Mobilising the long tail of scientists for all our benefit”
Common Ruby on RAILS platformCommon and exchanged codebases
Systems Biology models, data and protocolsAdopted by 4 EU wide consortiums and 4 UK sitesDeveloped with HITS and Stellenboch
Crowd sourced curated Web servicesAdopted by EdUnify and ELDA education projectsDeveloped with EBI and EMBRACE network
Find experts, advice, scripts, variable setsTowards interface for UK Data ArchivesDeveloped with NIBHI
SysMO-DB Project
A data access, model handling and data integration platform for Systems Biology:
• To support and manage the diversity of– Data, Models and experimental protocols
(SOPs) from a consortium
• Web based
• Standards compliant
DB
• Pan European collaboration• 13 individual projects, >100 institutes
– Different research outcomes – A cross-section of microorganisms, incl.
bacteria, archaea and yeast
• Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way
• Present these processes in the form of computerized mathematical models
• Pool research capacities and know-how
• Already running since April 2007• Runs for 3-5 years• This year, 2 new projects joined and 6 left
http://www.sysmo.net
Systems Biology of Microorganisms
Data Driven
• Multiple omics– genomics, transcriptomics– proteomics, metabolomics– fluxomics, reactomics
• Images• Molecular biology• Reaction Kinetics• Models
– Metabolic, gene network, kinetic• Relationships between data sets/experiments
– Procedures, experiments, data, results and models• Analysis of data
SOP
A Tree View of Assets
Investigation Studies Assay
ConstructionValidation
SOP
SOP
ISA infrastructure provides a directory structure for experiments
http://isatab.sourceforge.net/
Access Permissions
Just Enough Sharing
...we don’t talk about security
Attribution.Trust.
Credit
Reward and Provenance
Reusing myExperiment
COSMIC
SysMOLab
MOSES
Alfresco
Wiki
Wiki
ANOTHER
A DATASTORE
Just Enough sharing
SOP
Fetch on Request
Direct Upload
RightField: Annotation by Stealth
http://rightfield.org.uk
SEEK, the e-Laboratory
A dynamic resource for analysis as well as browsing
• Automatic comparison of data from inside files• Understanding where and how data and models are
linked• Running simulations with new experimental data• Running analyses and workflows over the data and
models
Open Integration: JWS SimulatorWeb based easy to use interface:“runs in your browser”, integrated in SEEK
Models can be accessed via browser, SEEK and web services.Data linked to models via file upload (e.g. Excel), or via database connection.
Standard simulation functionality
Data Fuse
Available services
http://www.taverna.org.uk
Workflow diagram
Workflow Explorer
Taverna Workbench
The Taverna Open Suite of ToolsClient User InterfacesGUI WorkbenchWorkflow
Repository
Service Catalogue
Third Party Tools
Programming and APIs
Web Portals
Activity and Service Plug-in Manager
Provenance Store
Workflow Server
Open Provenance
Model
Secure Service Access
Workflow Engine
Virtual Machine
Taverna and the ‘Cloud’
Analysing Next Generation Sequencing Data
+
Analysing African Cattle with Taverna 2.2
10,000 years separation
African Livestock adaptations:• Hardier
• Better disease resistance
Potential outcomes: • Food security• Understanding resistance• Understanding environmental Conditions
• Drought• Parasites
• Understanding diversity
http://www.bbc.co.uk/news/10403254
The Analysis Pipeline (in Perl)
MAP
FILTER
ANALYSIS
Input SNP data from sequencer
Map betweenGenome Builds (Liftover)
Filter for SNPs in Exons
SNP consequences
Identifying damaging SNPs (Polyphen)Harry Noyes –
University of Liverpool
Workflow and phases
Input SNP file
Populate DB with start SNP’s and resource version numbers
Lift-over: maps between UMD3 and BTA4 cow assemblies
Exon positions from ENSMBL
Find SNPs in Exon regions
PolyPhen to mark “damaging” SNP’s
Accessing Taverna on the Cloud
Architecture overview
Jobs Status
Input Provenance
Experiment Metadata
Input data summary
Loading inputs
Summary of Workflow Output
Non-synonymous coding SNPs
Polyphen predictions: probably damaging
11 Million SNP for N’ Dama
The result can be downloaded as a MySQL database or TSV /
CSV download
Why use the Cloud?
• This is a highly repetitive task– And “embarrassingly parallel”
• But it also needs to be done on demand
• And within the financial reach of researchers– Who do not always have access to their own compute
• We have very fast network access– So we don’t need to do this in-house
Timings
SEEK as a data analysis and meta analysis service
• SBML model construction and population Calibration workflow Data requirements
Parameterised SBML model Experimental data
Metabolite concentrations from key results database
Calibration by COPASI web service
Peter Li
Search and Analysis across data sets, models and stuff
• Analysis pool• Analysis As A
Cloud Service• Analysis using
Cloud Computing Services
• Run analysis tools and knowledge bases
Li et al, BMC Bioinformatics 2010, 11:582, doi:10.1186/1471-2105-11-582, highly accessedHucka and Le Novère, BMC Biology 2010, 8:140, doi:10.1186/1741-7007-8-140
Automated Model GenerationMCISB Centre (Li)
Annotation pipelineSUMO SysMO project (Maleki-Dizaji)
Workflow Management System
Next Gen Seq annotation pipelines using Amazon Cloud Services (Noyes, Li )
SysMO-DB Dev Team
University of Stellenbosch, South AfricaUniversity of Manchester, UK
Jacky Snoep
Heidelberg Institute for Theoretical Studies Germany
University of Manchester, UK
Olga Krebs
Wolfgang Müller
Sergejs Aleksejevs Carole Goble
Stuart Owen
Katy Wolstencroft
Finn Bacall
Franco du Preez
Quyen Ngyen
Further Information• myGrid
– http://www.mygrid.org.uk• Taverna
– http://www.taverna.org.uk• myExperiment
– http://www.myexperiment.org• BioCatalogue
– http://www.biocatalogue.org• SEEK
– http://www.sysmo-db.org• RightField
– http://www.rightfield.org.uk• MethodBox
– http://www.methodbox.org.uk