34
Data-driven research with e-Laboratories Stuart Owen University of Manchester [email protected] .uk

Data-driven research with e-Laboratories Stuart Owen University of Manchester [email protected]

Embed Size (px)

Citation preview

Page 1: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Data-driven research with e-Laboratories

Stuart Owen

University of Manchester

[email protected]

Page 2: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Social collaboration environments for sharing, curating and cataloguing personal, group and community contributed scientific assets. BSD5000+ registered users, 56 countries1600+ workflows, 1700+ services

Scientific workflow management system for accessing open, public data services, assembling data processing and analysis pipelines and recording provenance. LGPL361 organisation, 48 countries70,000+ binary downloads , ~4000 source

http://www.mygrid.org.uk

Handy tools for data management tasks in bioinformatics. BSD

Page 3: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Scientific workflows, scripts and pipelinesNow also neuroscience, music and numerical analysisDeveloped with Oxford and Southampton

Web-based Software & Sharing Services“Mobilising the long tail of scientists for all our benefit”

Common Ruby on RAILS platformCommon and exchanged codebases

Systems Biology models, data and protocolsAdopted by 4 EU wide consortiums and 4 UK sitesDeveloped with HITS and Stellenboch

Crowd sourced curated Web servicesAdopted by EdUnify and ELDA education projectsDeveloped with EBI and EMBRACE network

Find experts, advice, scripts, variable setsTowards interface for UK Data ArchivesDeveloped with NIBHI

Page 4: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

SysMO-DB Project

A data access, model handling and data integration platform for Systems Biology:

• To support and manage the diversity of– Data, Models and experimental protocols

(SOPs) from a consortium

• Web based

• Standards compliant

DB

Page 5: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

• Pan European collaboration• 13 individual projects, >100 institutes

– Different research outcomes – A cross-section of microorganisms, incl.

bacteria, archaea and yeast

• Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way

• Present these processes in the form of computerized mathematical models

• Pool research capacities and know-how

• Already running since April 2007• Runs for 3-5 years• This year, 2 new projects joined and 6 left

http://www.sysmo.net

Systems Biology of Microorganisms

Page 6: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Data Driven

• Multiple omics– genomics, transcriptomics– proteomics, metabolomics– fluxomics, reactomics

• Images• Molecular biology• Reaction Kinetics• Models

– Metabolic, gene network, kinetic• Relationships between data sets/experiments

– Procedures, experiments, data, results and models• Analysis of data

Page 7: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

SOP

A Tree View of Assets

Investigation Studies Assay

ConstructionValidation

SOP

SOP

ISA infrastructure provides a directory structure for experiments

http://isatab.sourceforge.net/

Page 8: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk
Page 9: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Access Permissions

Just Enough Sharing

...we don’t talk about security

Page 10: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Attribution.Trust.

Credit

Reward and Provenance

Reusing myExperiment

Page 11: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

COSMIC

SysMOLab

MOSES

Alfresco

Wiki

Wiki

ANOTHER

A DATASTORE

Just Enough sharing

SOP

Fetch on Request

Direct Upload

Page 12: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

RightField: Annotation by Stealth

http://rightfield.org.uk

Page 13: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk
Page 14: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

SEEK, the e-Laboratory

A dynamic resource for analysis as well as browsing

• Automatic comparison of data from inside files• Understanding where and how data and models are

linked• Running simulations with new experimental data• Running analyses and workflows over the data and

models

Page 15: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Open Integration: JWS SimulatorWeb based easy to use interface:“runs in your browser”, integrated in SEEK

Models can be accessed via browser, SEEK and web services.Data linked to models via file upload (e.g. Excel), or via database connection.

Standard simulation functionality

Page 16: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Data Fuse

Page 17: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Available services

http://www.taverna.org.uk

Workflow diagram

Workflow Explorer

Taverna Workbench

Page 18: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

The Taverna Open Suite of ToolsClient User InterfacesGUI WorkbenchWorkflow

Repository

Service Catalogue

Third Party Tools

Programming and APIs

Web Portals

Activity and Service Plug-in Manager

Provenance Store

Workflow Server

Open Provenance

Model

Secure Service Access

Workflow Engine

Virtual Machine

Page 19: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Taverna and the ‘Cloud’

Analysing Next Generation Sequencing Data

+

Page 20: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Analysing African Cattle with Taverna 2.2

10,000 years separation

African Livestock adaptations:• Hardier

• Better disease resistance

Potential outcomes: • Food security• Understanding resistance• Understanding environmental Conditions

• Drought• Parasites

• Understanding diversity

http://www.bbc.co.uk/news/10403254

Page 21: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

The Analysis Pipeline (in Perl)

MAP

FILTER

ANALYSIS

Input SNP data from sequencer

Map betweenGenome Builds (Liftover)

Filter for SNPs in Exons

SNP consequences

Identifying damaging SNPs (Polyphen)Harry Noyes –

University of Liverpool

Page 22: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Workflow and phases

Input SNP file

Populate DB with start SNP’s and resource version numbers

Lift-over: maps between UMD3 and BTA4 cow assemblies

Exon positions from ENSMBL

Find SNPs in Exon regions

PolyPhen to mark “damaging” SNP’s

Page 23: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Accessing Taverna on the Cloud

Page 24: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Architecture overview

Page 25: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Jobs Status

Input Provenance

Experiment Metadata

Input data summary

Loading inputs

Page 26: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Summary of Workflow Output

Non-synonymous coding SNPs

Polyphen predictions: probably damaging

11 Million SNP for N’ Dama

The result can be downloaded as a MySQL database or TSV /

CSV download

Page 27: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Why use the Cloud?

• This is a highly repetitive task– And “embarrassingly parallel”

• But it also needs to be done on demand

• And within the financial reach of researchers– Who do not always have access to their own compute

• We have very fast network access– So we don’t need to do this in-house

Page 28: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Timings

Page 29: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk
Page 30: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

SEEK as a data analysis and meta analysis service

• SBML model construction and population Calibration workflow Data requirements

Parameterised SBML model Experimental data

Metabolite concentrations from key results database

Calibration by COPASI web service

Peter Li

Page 31: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Search and Analysis across data sets, models and stuff

• Analysis pool• Analysis As A

Cloud Service• Analysis using

Cloud Computing Services

• Run analysis tools and knowledge bases

Li et al, BMC Bioinformatics 2010, 11:582, doi:10.1186/1471-2105-11-582, highly accessedHucka and Le Novère, BMC Biology 2010, 8:140, doi:10.1186/1741-7007-8-140

Automated Model GenerationMCISB Centre (Li)

Annotation pipelineSUMO SysMO project (Maleki-Dizaji)

Workflow Management System

Next Gen Seq annotation pipelines using Amazon Cloud Services (Noyes, Li )

Page 32: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk
Page 33: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

SysMO-DB Dev Team

University of Stellenbosch, South AfricaUniversity of Manchester, UK

Jacky Snoep

Heidelberg Institute for Theoretical Studies Germany

University of Manchester, UK

Olga Krebs

Wolfgang Müller

Sergejs Aleksejevs Carole Goble

Stuart Owen

Katy Wolstencroft

Finn Bacall

Franco du Preez

Quyen Ngyen

Page 34: Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

Further Information• myGrid

– http://www.mygrid.org.uk• Taverna

– http://www.taverna.org.uk• myExperiment

– http://www.myexperiment.org• BioCatalogue

– http://www.biocatalogue.org• SEEK

– http://www.sysmo-db.org• RightField

– http://www.rightfield.org.uk• MethodBox

– http://www.methodbox.org.uk