Data-driven research with e-Laboratories Stuart Owen University of Manchester...

Preview:

Citation preview

Data-driven research with e-Laboratories

Stuart Owen

University of Manchester

stuart.owen@manchester.ac.uk

Social collaboration environments for sharing, curating and cataloguing personal, group and community contributed scientific assets. BSD5000+ registered users, 56 countries1600+ workflows, 1700+ services

Scientific workflow management system for accessing open, public data services, assembling data processing and analysis pipelines and recording provenance. LGPL361 organisation, 48 countries70,000+ binary downloads , ~4000 source

http://www.mygrid.org.uk

Handy tools for data management tasks in bioinformatics. BSD

Scientific workflows, scripts and pipelinesNow also neuroscience, music and numerical analysisDeveloped with Oxford and Southampton

Web-based Software & Sharing Services“Mobilising the long tail of scientists for all our benefit”

Common Ruby on RAILS platformCommon and exchanged codebases

Systems Biology models, data and protocolsAdopted by 4 EU wide consortiums and 4 UK sitesDeveloped with HITS and Stellenboch

Crowd sourced curated Web servicesAdopted by EdUnify and ELDA education projectsDeveloped with EBI and EMBRACE network

Find experts, advice, scripts, variable setsTowards interface for UK Data ArchivesDeveloped with NIBHI

SysMO-DB Project

A data access, model handling and data integration platform for Systems Biology:

• To support and manage the diversity of– Data, Models and experimental protocols

(SOPs) from a consortium

• Web based

• Standards compliant

DB

• Pan European collaboration• 13 individual projects, >100 institutes

– Different research outcomes – A cross-section of microorganisms, incl.

bacteria, archaea and yeast

• Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way

• Present these processes in the form of computerized mathematical models

• Pool research capacities and know-how

• Already running since April 2007• Runs for 3-5 years• This year, 2 new projects joined and 6 left

http://www.sysmo.net

Systems Biology of Microorganisms

Data Driven

• Multiple omics– genomics, transcriptomics– proteomics, metabolomics– fluxomics, reactomics

• Images• Molecular biology• Reaction Kinetics• Models

– Metabolic, gene network, kinetic• Relationships between data sets/experiments

– Procedures, experiments, data, results and models• Analysis of data

SOP

A Tree View of Assets

Investigation Studies Assay

ConstructionValidation

SOP

SOP

ISA infrastructure provides a directory structure for experiments

http://isatab.sourceforge.net/

Access Permissions

Just Enough Sharing

...we don’t talk about security

Attribution.Trust.

Credit

Reward and Provenance

Reusing myExperiment

COSMIC

SysMOLab

MOSES

Alfresco

Wiki

Wiki

ANOTHER

A DATASTORE

Just Enough sharing

SOP

Fetch on Request

Direct Upload

RightField: Annotation by Stealth

http://rightfield.org.uk

SEEK, the e-Laboratory

A dynamic resource for analysis as well as browsing

• Automatic comparison of data from inside files• Understanding where and how data and models are

linked• Running simulations with new experimental data• Running analyses and workflows over the data and

models

Open Integration: JWS SimulatorWeb based easy to use interface:“runs in your browser”, integrated in SEEK

Models can be accessed via browser, SEEK and web services.Data linked to models via file upload (e.g. Excel), or via database connection.

Standard simulation functionality

Data Fuse

Available services

http://www.taverna.org.uk

Workflow diagram

Workflow Explorer

Taverna Workbench

The Taverna Open Suite of ToolsClient User InterfacesGUI WorkbenchWorkflow

Repository

Service Catalogue

Third Party Tools

Programming and APIs

Web Portals

Activity and Service Plug-in Manager

Provenance Store

Workflow Server

Open Provenance

Model

Secure Service Access

Workflow Engine

Virtual Machine

Taverna and the ‘Cloud’

Analysing Next Generation Sequencing Data

+

Analysing African Cattle with Taverna 2.2

10,000 years separation

African Livestock adaptations:• Hardier

• Better disease resistance

Potential outcomes: • Food security• Understanding resistance• Understanding environmental Conditions

• Drought• Parasites

• Understanding diversity

http://www.bbc.co.uk/news/10403254

The Analysis Pipeline (in Perl)

MAP

FILTER

ANALYSIS

Input SNP data from sequencer

Map betweenGenome Builds (Liftover)

Filter for SNPs in Exons

SNP consequences

Identifying damaging SNPs (Polyphen)Harry Noyes –

University of Liverpool

Workflow and phases

Input SNP file

Populate DB with start SNP’s and resource version numbers

Lift-over: maps between UMD3 and BTA4 cow assemblies

Exon positions from ENSMBL

Find SNPs in Exon regions

PolyPhen to mark “damaging” SNP’s

Accessing Taverna on the Cloud

Architecture overview

Jobs Status

Input Provenance

Experiment Metadata

Input data summary

Loading inputs

Summary of Workflow Output

Non-synonymous coding SNPs

Polyphen predictions: probably damaging

11 Million SNP for N’ Dama

The result can be downloaded as a MySQL database or TSV /

CSV download

Why use the Cloud?

• This is a highly repetitive task– And “embarrassingly parallel”

• But it also needs to be done on demand

• And within the financial reach of researchers– Who do not always have access to their own compute

• We have very fast network access– So we don’t need to do this in-house

Timings

SEEK as a data analysis and meta analysis service

• SBML model construction and population Calibration workflow Data requirements

Parameterised SBML model Experimental data

Metabolite concentrations from key results database

Calibration by COPASI web service

Peter Li

Search and Analysis across data sets, models and stuff

• Analysis pool• Analysis As A

Cloud Service• Analysis using

Cloud Computing Services

• Run analysis tools and knowledge bases

Li et al, BMC Bioinformatics 2010, 11:582, doi:10.1186/1471-2105-11-582, highly accessedHucka and Le Novère, BMC Biology 2010, 8:140, doi:10.1186/1741-7007-8-140

Automated Model GenerationMCISB Centre (Li)

Annotation pipelineSUMO SysMO project (Maleki-Dizaji)

Workflow Management System

Next Gen Seq annotation pipelines using Amazon Cloud Services (Noyes, Li )

SysMO-DB Dev Team

University of Stellenbosch, South AfricaUniversity of Manchester, UK

Jacky Snoep

Heidelberg Institute for Theoretical Studies Germany

University of Manchester, UK

Olga Krebs

Wolfgang Müller

Sergejs Aleksejevs Carole Goble

Stuart Owen

Katy Wolstencroft

Finn Bacall

Franco du Preez

Quyen Ngyen

Further Information• myGrid

– http://www.mygrid.org.uk• Taverna

– http://www.taverna.org.uk• myExperiment

– http://www.myexperiment.org• BioCatalogue

– http://www.biocatalogue.org• SEEK

– http://www.sysmo-db.org• RightField

– http://www.rightfield.org.uk• MethodBox

– http://www.methodbox.org.uk

Recommended