33
Alejandra GonzálezBeltrán, PhD Senior Software Engineer, ISATeam University of Oxford eResearch Centre, Oxford, UK Drug Discovery 2012, Manchester, UK, September 67 Communitystandards for reproducible and reusable research fundamentals and challenges

Drug Discovery- ELRIG -2012

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Drug Discovery- ELRIG -2012

Alejandra  González-­‐Beltrán,  PhD  

Senior Software Engineer, ISATeam University  of  Oxford  e-­‐Research  Centre,  Oxford,  UK  

Drug  Discovery  2012,  Manchester,  UK,  September  6-­‐7  

Community-­‐standards  for  reproducible  and  reusable  research  -­‐    

fundamentals  and  challenges  

Page 2: Drug Discovery- ELRIG -2012

Ioannidis   et   al.,   Repeatability   of   published   microarray  gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55  (2009)  doi:10.1038/ng.295    

Page 3: Drug Discovery- ELRIG -2012

Ioannidis   et   al.,   Repeatability   of   published   microarray  gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55  (2009)  doi:10.1038/ng.295    

Page 4: Drug Discovery- ELRIG -2012

Roadmap  

Reproducible  &  Reusable    Bioscience  Research  

Principles  &  Challenges  

Page 5: Drug Discovery- ELRIG -2012

Roadmap  

Reproducible  &  Reusable    Bioscience  Research  

Well-­‐annotated  &  Structured  Data  

reasoning  

analysis  

exchange  

integraYon  

visualizaYon  

browsing  retrieval  

Principles  &  Challenges  

Page 6: Drug Discovery- ELRIG -2012

Roadmap  

Reproducible  &  Reusable    Bioscience  Research  

Well-­‐annotated  &  Structured  Data  

reasoning  

analysis  

exchange  

integraYon  

visualizaYon  

browsing  retrieval  

Principles  &  Challenges  

Community  Standards   So[ware  Tools  

Page 7: Drug Discovery- ELRIG -2012

Source  of  the  figure:  EBI  website  

§       Interdisciplinary  and  integra9ve  in  character    •  need  to  deal  with  new  and  exis9ng  datasets  

•  deal  with  a  variety  of  data  types  

Bioscience  is  mulY-­‐domain…  

tox/pharma  

env  

health  

agro  

Page 8: Drug Discovery- ELRIG -2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

From  reusable  data  to  reproducible  research  

To   make   the   datasets   comprehensible   and   interoperable,   underpinning   future  

invesYgaYons,  we  need  common  ways  to  report  and  share  the  experimental  details  

and  the  associated  results  

Consistent  reporYng  will  have  a  posiYve  and  long-­‐lasYng  impact  on  the  value  of  

collec9ve  scien9fic  outputs.  

Community  Standards  

Page 9: Drug Discovery- ELRIG -2012

Different  communiYes,  different  norms  and  standards,  e.g.:  

report  the  same  core,    essenYal  informaYon    

use  the  same  term  to  refer  to  the  same  ‘thing’  allow  data  to  flow  from  

one  system  to  another  

Page 10: Drug Discovery- ELRIG -2012

Different  communiYes,  different  norms  and  standards,  e.g.:  

report  the  same  core,    essenYal  informaYon    

use  the  same  term  to  refer  to  the  same  ‘thing’  allow  data  to  flow  from  

one  system  to  another  

Challenges: lack of interaction and coordination, duplication of effort, fragmentation and uneven coverage…hinders interoperability

Page 11: Drug Discovery- ELRIG -2012

GIATE  Guidelines  for  InformaYon  About  Therapy  Experiments  

Clinical  Model  

Animal  Model  

Cellular  Model  

Molecular  Model  

TherapeuYc  InvesYgaYon  

Molecular  Model  

Cellular  Model  

Animal  Model  

Clinical  Model  

Generic  Model  

Page 12: Drug Discovery- ELRIG -2012

VO!

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML!SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO  

Growing  number  of  bioscience  reporYng  standards  

GIATE!

Page 13: Drug Discovery- ELRIG -2012

130  +      

Es9mated  

150  +      

Source:  MIBBI,    EQ

UATO

R  

303  +      

Source:  BioPortal  Databases,    annotaYon,  curaYon    tools  

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML!SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO  

VO!GIATE!

Growing  number  of  bioscience  reporYng  standards  

Page 14: Drug Discovery- ELRIG -2012

But…    what  do  we  know  about  them  and  how  they  are  related  

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML!SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO  

VO!GIATE!

Page 15: Drug Discovery- ELRIG -2012

Which  ones  are  mature  enough  for  

me  to  use  or  recommend?  

I  work  on  plants,  are  these  just  for  biomedical  applicaYons?  

What  are  the  criteria  to  evaluate  their  status  and  

value?  

How  can  I  get  involved  to  propose  

extensions  or  modificaYons?  

Which  tools  and  databases  

implement  which  standards?  

I  use  high  throughput  sequencing  technologies,  which  ones  are  relevant  to  

me?  

Which  formats  support  specific  

minimum  informaYon  guidelines?  

But…    what  do  we  know  about  them  and  how  they  are  related  

Page 16: Drug Discovery- ELRIG -2012

A  coherent,  curated  and  searchable  catalogue  of  data  sharing  resources  

 •  Bioscience  standards  and  

associated  data-­‐sharing  policies,  publica9ons,  tools  and  databases  

•  Assessment  criteria  for  usability  and  popularity  of  standards  

•  Rela9onships  among  standards  

•  Encouragement  for  communica9on  &  interac9on  among  groups  

•  PromoYng  interoperability  &  informed  decisions  about  standards  

Page 17: Drug Discovery- ELRIG -2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

Standards  compliance  is  challenging…  

Is  it  possible  to  achieve  a  common,  structured  representaYon  

of  diverse  bioscience  experiments  that:  

•  transcends  individual  bioscience  domains,  but  also  

•  follows  the  appropriate  community  norms  and  standards?  

Page 18: Drug Discovery- ELRIG -2012

§  Capture  all  salient  features  of  the  experimental  workflow  

 

§  Make  annotaYon  explicit  and  discoverable  

   

§  Structure  the  descripYons  for  consistency,  tracking  §  independent  variables  §  dependent  variables  and  using  §  resolvable  idenYfiers  and  

cross-­‐references  

Structured  descripYon  of  datasets  

Page 19: Drug Discovery- ELRIG -2012

§  We  must  strike  a  balance  between  sufficiency  and  pracYcability:  •  depth  and  breadth  of  

informaYon  •  burden  to  produce  and  

maintain  the  informaYon  

Not  too  much,  not  too  lille,  just  ‘right’  

Page 20: Drug Discovery- ELRIG -2012

MAGE-Tab Pride-xml

SRA-xml SOFT

Metadata tracking framework, designed to support the use of several standards c h e c k l i s t s , t e r m i n o l o g i e s a n d conversions to (a growing number of) other metadata formats , used by public repositories, e.g.

Page 21: Drug Discovery- ELRIG -2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

user community

Page 22: Drug Discovery- ELRIG -2012

ISA   soQware   suite:   supporYng   standards-­‐compliant   experimental   annotaYon   and  enabling  curaYon  at  the  community  level  (Rocca-­‐Serra  et  al,  2010)  

Page 23: Drug Discovery- ELRIG -2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

23

empowering researchers to use standards

Page 24: Drug Discovery- ELRIG -2012

Ontology  Search  and  Tagging  in  Google  Spreadsheets  

Page 25: Drug Discovery- ELRIG -2012

Ontology Search and Tagging in Google Spreadsheets

Page 26: Drug Discovery- ELRIG -2012

ISA  infrastructure  &  linked  data  

•  Work  in  progress  to  convert  to  RDF/OWL  to  connect  to  the  growing  Linked  Data  universe        RDF  =  Resource  DescripYon  Framework,  OWL  =  Web  Ontology  Language  

•  CollaboraYons  with  Toxbank  &  W3C  HCLSIG  

<subject,  predicate,  object>    <lipoprotein>  <parYcipates_in>  <inflammatory  response>    <PRO:212342352>  <BFO_0000056>  <GO:0006954>  

Page 27: Drug Discovery- ELRIG -2012

Increasing  levels  of  structure…  

Notes in Lab Books(information for humans)

Spreadsheets and Tables( the compromise)

Facts as RDF statements(information for machines)

Page 28: Drug Discovery- ELRIG -2012

A  growing  ecosystem  of  over  30  public  and  internal  resources  using  the  ISA  metadata   tracking   framework   to   facilitate   standards-­‐compliant  collec9on,  cura9on,  management  and  reuse  of  invesYgaYons  in  an  increasingly  diverse  set  of  life  science  domains,  including:      

•  environmental  health  •  environmental  genomics  •  metabolomics  •  metagenomics  •  nanotechnology  •  proteomics,  

We  aim  to  achieve  a  common  representaYon  of  experimental  content  that  transcends  

individual  bioscience  domains  

Sansone et al., Towards interoperable bioscience data. Nature Genetics 44, 121-126 (2012) doi:10.1038/ng.1054

•  stem  cell  discovery  •  system  biology  •  transcriptomics  •  toxicogenomics  •  also  by  communiYes  working  to  build  a  

library  of  cellular  signatures  

Page 29: Drug Discovery- ELRIG -2012

Nanotechnology    InformaYcs  Working  

Group    

Some  of  the  internal  projects:  Some  of  the  public  groups/resources:  

4

Stem Cell Commons

Stem Cell Commons

A  growing  ecosystem  of  over  30  public  and  internal  resources  using  the  ISA  metadata   tracking   framework   to   facilitate   standards-­‐compliant  collec9on,  cura9on,  management  and  reuse  of  invesYgaYons  in  an  increasingly  diverse  set  of  life  science  domains,  including:      

•  environmental  health  •  environmental  genomics  •  metabolomics  •  metagenomics  •  nanotechnology  •  proteomics,  

•  stem  cell  discovery  •  system  biology  •  transcriptomics  •  toxicogenomics  •  also  by  communiYes  working  to  build  a  

library  of  cellular  signatures  

Page 30: Drug Discovery- ELRIG -2012

Implementation at Harvard

ISA

hlp://discovery.hsci.harvard.edu/  

Page 31: Drug Discovery- ELRIG -2012

31

Implementation at the EBI

hlp://www.ebi.ac.uk/metabolights  

Page 32: Drug Discovery- ELRIG -2012

lack  of  coordinaYon,  

fragmentaYon  and  uneven  coverage  

Standards-­‐compliant    data  sharing  is    demanding  and    Yme-­‐consuming  

GIATE  Guidelines  

Terminologies  

Formats  

Reproducible  &  Reusable    Bioscience  Research  

Well-­‐annotated  &  Structured  Data  

reasoning  

analysis  

exchange  

integraYon  

visualizaYon  

browsing  retrieval  

Community  Standards   So[ware  Tools  

Page 33: Drug Discovery- ELRIG -2012

@isatools  @biosharing  Isa-­‐tools.org          isacommons.org        biosharing.org