25
Alejandra GonzalezBeltran University of Oxford eResearch Centre, UK The ISA Infrastructure for the biosciences from data curaDon at source to the linked data cloud [email protected] Conference on Semantics in Healthcare and Life Sciences (CSHALS) Boston, USA Feb 27- Mar 1 2013

CSHALS 2013

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: CSHALS 2013

Alejandra  Gonzalez-­‐Beltran  University  of  Oxford  e-­‐Research  Centre,  UK  

The  ISA  Infrastructure  for  the  biosciences  from  data  curaDon  at  source  to  the  linked  data  cloud  

[email protected]  

Conference on Semantics in Healthcare and Life Sciences (CSHALS)

Boston, USA Feb 27- Mar 1 2013

Page 2: CSHALS 2013

•  The                                    infrastructure  :  a  metadata  tracking  framework  in  the  biosciences:  the                                                  format,    a  set  of  open  source  soMware  tools  and  the  user  community  

•  The                                                syntax  and  its  implicit  semanDcs  

•  The                                                  component  of  the  infrastructure  

•                                             for  mapping  the  syntax  to  ontologies  

• A  couple  of  mappings,  architecture,  conversion  

Outline  

Page 3: CSHALS 2013
Page 4: CSHALS 2013

Contextual  informaDon  (metadata):  •  Sample  characterisDcs  •  Technology  and  measurement  types  •  Instrument  parameters  •  …  

Page 5: CSHALS 2013

Need  for  a  generic  representaDon,  applied  to:    •microarray  based  experiments  (MAGE)    •sequencing  based  experiments  (SRA)    •flow  cytometry  based  experiments  (FuGE-­‐Flow  Cyt)    •mass  spectrometry  and  NMR  spectroscopy  

experiments  (Metabolights  and  PRIDE)  

Page 6: CSHALS 2013

• Assist  in  the  annotaDon  and  management  of  experimental  metadata  at  source,  supporDng  data  provenance  tracking  

• Deal  with  high-­‐throughput  studies  using  one  or  a  combinaDon  of  omics  and  other  technologies  

•  Empower  users  to  uptake  community-­‐defined  checklists  and  ontologies  

•  Facilitate  data  sharing,  re-­‐use,  comparison  and  reproducibility  of  experiments,  submission  to  internaDonal  public  repositories  

                           infrastructure  ISA  soMware  suite:  supporDng  

standards-­‐compliant  experimental  annotaDon  and  enabling  curaDon  at  

the  community  level  Rocca-­‐Serra  et  al,    2010  

BioinformaDcs  

Page 7: CSHALS 2013

A  growing  ecosystem    of  over  30  public  and  internal  resources  using  the  ISA  metadata  tracking  framework    

to  facilitate  standards-­‐compliant  collecDon,  curaDon,  management  and  reuse  of  invesDgaDons  in  an  increasingly  diverse  set  of  life  science  domains.  

Towards  interoperable  bioscience  data  Sansone  et  al,  2012  Nature  GeneDcs  

Page 8: CSHALS 2013

 syntax    (and  its  implicit  semanDcs)  

Page 9: CSHALS 2013
Page 10: CSHALS 2013

Protocol Process

Characteristics[…] Factor Value[…] (independent variables) Material Type Comment[…]

Date (day effect)

Performer (operator effect)

Parameter Value […]

Derived Data File

Raw Data File

Data File Node

" DATA!

" Material!

Material Node

Sample  Name   Material  Type    

HybridizaDon  Assay  Name   Assay  Design  REF   Array  Data  File   Protocol  REF   Derived  Array  Data  File  

 

sample1   genomic  DNA   assay1   A-AFFY-107" assay1.cel   data  normalizaDon   assay1.txt  

sample2   genomic  DNA   assay2   A-AFFY-107" assay2.cel   data  normalizaDon   assay2.txt  

sample3   genomic  DNA   assay3   A-AFFY-107" assay3.cel   data  normalizaDon   assay3.txt  

Material transformations...

" Material!

" DATA!

Page 11: CSHALS 2013

Tagging:  from  free  text  to  ontology-­‐based  • single  intervenDon  representaDon,  free  text  annotaDon  

• single  intervenDon,  ontology-­‐based  annotaDon  

11  

Source  Name   CharacterisDcs[organism]    

Factor  Value[perturbaDon  agent]  

Factor  Value[dose]  

Factor  Value[duraDon]  

individual1   human   aspirin   high  dose   12  weeks  

Source  Name  CharacterisDcs[organismobi:0100026)])    

Term  Source  REF  

Term  Accession  Number  

Factor  Value[chemical  compound  CHEBI_37577)]  

Term  Source  REF  

Term  Accession  Number  

individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  

Factor  Value[dose(OBI_0000984)  

Term  Source  REF  

Term  Accession  Number  

Factor  Value[Dme  (PATO_0000165)]   Unit   Term  Source  

REF  Term  Accession  Number  

low  dose   LNC   LP30872-­‐3   12   week   UO   “0000034”  

Page 12: CSHALS 2013

Kohonen  et  al.  The  ToxBank  Data  Warehouse:  a  research  cluster  of  7    

EU  FP7  Health  systems  toxicology  and  toxicogenomics  projects.  

 

Health  Care  &  Life  Sciences    Interest  Group    

ToxBank  effort    developed  by  Nina  Jeliazkova    

Page 13: CSHALS 2013

•  Make  the  semanDcs  of  ISAtab  explicit,  including  materials  &  data  enDDes  &  processes  &  their  relaDonships  

•  Provide  incenDves  for  provision  of  ontology-­‐based  annotaDons  in  ISA-­‐TAB  datasets;  exploit  those  annotaDons    

•  Augment  ISA  syntax  with  new  elements  (e.g.  groups),  facilitaDng  the  understanding  &  querying  of  experimental  design  

•  Facilitate  data  integraDon  &  knowledge  discovery/reasoning  

Page 14: CSHALS 2013

architecture  

ISA-TAB parser isa2owl mapping

parser graph

analysis

Configuration file

Page 15: CSHALS 2013

•  Ontology  search  and  automated  tagging    (relying  on    NCBO  Bioportal  services)  on  Google  Spreadsheets  •  CollaboraDve  annotaDon;  support  for  distributed  users  •  Version  control  &  history  

OntoMaton:  a  Bioportal  powered  Ontology  widget  for  Google  

Spreadsheets  Maguire  et  al,    2013  

BioinformaDcs  

Page 16: CSHALS 2013
Page 17: CSHALS 2013

 Expe

rimen

tal  

domain  

 

Biomolecular    domain  

 

Chemical  domain  

 

InformaDon  domain  

 

vocabularies  

Source  Name  CharacterisDcs[organismobi:0100026)])    

Term  Source  REF  

Term  Accession  Number  

Factor  Value[chemical  compound  CHEBI_37577)]  

Term  Source  REF  

Term  Accession  Number  

individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  

Page 18: CSHALS 2013

Source  Name  CharacterisDcs[organismobi:0100026)])    

Term  Source  REF  

Term  Accession  Number  

Factor  Value[chemical  compound  CHEBI_37577)]  

Term  Source  REF  

Term  Accession  Number  

individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  

OBI  

GO  ChEBI   IAO  

Open  Biological  and  Biomedical  Ontologies  

(OBO)  Foundry   BFO  

Page 19: CSHALS 2013

ISA-­‐OBI  mapping  

Page 20: CSHALS 2013

ISA-­‐SIO  mapping  

Page 21: CSHALS 2013

Data  subset:  LC/MS  peaks  from  the  spinal  cords  of  6  wild-­‐type  and  6  FAAH  (fapy  acid  amyde  hydrolase)  knockout  mice  

faahKO  dataset    Available  in  

Bioconductor    (with  ISA-­‐TAB  metadata)  

Global  metabolite  profiling  

Page 22: CSHALS 2013
Page 23: CSHALS 2013

•  support  different  conversion  modes  (different  levels  of  granularity)  

•  querying  for  ISA-­‐TAB  datasets,  across  mulDple  experiment  types  

•  reasoning  exploiDng  ontology  annotaDons  •   semanDc  validaDon  of  ISA-­‐TAB  datasets  

•  augmented  annotaDon  over  naDve  ISA  syntax  

•  idenDficaDon  gaps  in  ontological  representaDons    •  feedback  of  findings  to  community  ontologies  

 

Page 24: CSHALS 2013

Increasing  level  of  structure    for  experimental  metadata  

Notes  in  Lab  books  

Spreadsheets  &  Tables  (ISAtab  metadata)  

Facts  as  RDF  statements  

Page 25: CSHALS 2013

@isatools @biosharing

isa-tools.org isacommons.org biosharing.org