Upload
jessamine-blanchard
View
30
Download
2
Embed Size (px)
DESCRIPTION
ODD-Genes: Accelerating data-driven scientific discovery. NeSC Review 2003 NeSC 2003-09-30. Introduction. ODD-Genes Background Science enabled by ODD-Genes Automating routine statistical conditioning of highly variable microarray results. Discovering related data sources - PowerPoint PPT Presentation
Citation preview
ODD-Genes:Accelerating data-driven
scientific discoveryNeSC Review 2003
NeSC2003-09-30
Introduction
ODD-Genes BackgroundScience enabled by ODD-Genes
Automating routine statistical conditioning of highly variable microarray results.Discovering related data sourcesQuerying discovered data sources for relevant dataIdentifying significant targets for focussed investigation
Caveats & further work
ODD-Genes Background
ODD-Genes is a demonstratorDemonstrates how Grid technologies enable e-Science, accelerating scientific discoverySunDCG’s TOG software allows for job submission on remote compute resources OGSA-DAI provides access, control and discovery of data resources
ODD-Genes used to investigate Wilms TumourRoutine statistical conditioning of microarray resultsData-driven discovery of novel targets for investigation and potential therapy
Collaborative projectNeSC/EPCC, Edinburgh, UKScottish Centre for Genomic Technology and Informatics, Edinburgh, UK (GTI)Human Genetics Unit at MRC, Western General Hospital, Edinburgh, UK (HGU)
SunDCG – Enabling Routine Statistical Conditioning
Choose analysis to perform
Automates analysis processProvides predetermined workflowCan run more than one analysis at a timeMultiple reproducible avenues for investigationReduces cost (human, machine), increases availability
TOG enables this by allowing access to HPC resources
SunDCG - Conditioning Results
Results of conditioning can be analysed and investigated
Researcher has potentially several views of data to explore, all presented simultaneously in parallel (cp traditional serialised, manual process)Researcher can reproduce this initial condition for repeated analysesResearcher need not perform each step manually and serially, or ask dedicated statistician to do so.
OGSA-DAI - Results Investigation
Multiple views of data
RawHeat MapCluster Map
Wilms Tumour study takes a new direction
two genes appear significant in early development
Researchers would like more info on these genes…
OGSA-DAI - Data Resource Discovery
OGSA-DAI uses keywords to locate relevant data resourcesMay return data resources previously unknown to researcherResearcher selects most interesting data resource to query for information about gene
Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic developmentContrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions
OGSA-DAI - Data Resource Query
OGSA-DAI returns data from query
Data and annotation displayed
Data contains references to related images
Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression
These show that the genes are stem cell markers
Targets for focussed investigation, potential therapy
ODD-Genes Caveats & Further Work
ODD-Genes is a demonstratorNeed to develop production applications for both routine statistical processing and data resource discovery and queryNeed to parameterise routine conditioning appropriately to complete automation
ODD-Genes requires GRID infrastructureParticipating researchers need to partner with centres who host application front-ends (or, host the infrastructure themselves)However, alternatives often proprietary, expensive, less flexible
ODD-Genes requires registration by data-hostsCritical mass of registered data sources.