Collaborative ontology development by scientists Melissa Haendel

Embed Size (px)

Citation preview

  • Slide 1
  • Collaborative ontology development by scientists Melissa Haendel
  • Slide 2
  • Setting the stage 1.Who we are and what do we need 2.What are our bottlenecks: Getting info from the domain experts Ontology tools Synchronizing ontologies 3. Modularizing anatomy ontologies 4. Ideas for collaborative ontology editing
  • Slide 3
  • Who are we? What do we want? Domain Experts: Anatomists, comparative morphologists, developmental biologists, immunologists, neuroscientists, etc. Ontologists: Biologists-gone-informatics, computer scientists and logicians Engineers: Our tool builders Ontologies and tools to develop them Domain experts: want to query for gene expression and phenotypes across species Ontologists: have to be able to interpret and represent domain knowledge computationally Engineers: have to build tools that can consume ontologies and give the Domain Experts the right results
  • Slide 4
  • Anatomy and phenotype ontologies have work hard for us Ontologies must be intelligible to: HumansMachines Enable comparison of structures across different organisms Standardization of vocabulary among communities Integration across databases Query across large amount of data Automatic reasoning to infer related classes Error checking Annotation consistency
  • Slide 5
  • Term needed for annotation Ontology development workflow and bottlenecks reconcile
  • Slide 6
  • Term requested Ontology development workflow and bottlenecks reconcile
  • Slide 7
  • Term discussed by community Ontology development workflow and bottlenecks reconcile
  • Slide 8
  • Ontology development workflow and bottlenecks reconcile
  • Slide 9
  • GO CL CARO TAOAAO XAOZFA MA MP UBERON Ontology development workflow and bottlenecks reconcile Synchronize?
  • Slide 10
  • 1) Extracting domain knowledge into an ontology efficiently 2) Multiple ontology editing tools, each with pros and cons, neither easily used by domain experts 3) Synchronization across interoperable ontologies Three bottlenecks
  • Slide 11
  • How can we increase the efficiency of extracting knowledge from domain experts? An example of what has worked well so far: 1862 Christian Schussele Familiar tooling: Google docs, Phenote, Excel Visualization: Cmap, Vue, GraphViz Need too merge different sources of information Need a way to get this information into a computable form
  • Slide 12
  • Two ontology editors (and viewers) commonly used by the biomedical community http://oboedit.org/ OBOEdit- OBO ontology editor and viewer Protg - OWL ontology editor and viewer http://protege.stanford.edu/ Both tools are non-trivial to learn to use Neither have a lot of bulk operations, import/export different formats easily, or deal with synchronization readily There is a barrier for domain experts to contribute knowledge, and a bottleneck for editors to get this knowledge into ontologies efficiently More biologist-friendly (thank you John!) Tool used by broader community
  • Slide 13
  • How to synchronize ontologies Mapping (bioportal set,..) Direct reconciliation (TAO and ZFA) Synchronization using imports Three approaches:
  • Slide 14
  • Ontology mappings are often not useful FMA (human) tibiaFBbt (fruitfly) tibia FMA extensor retinaculum of wrist MA retina GAZ (geography) ColonFMA (human) Colon ZFA (zebrafish) aortic archMA (mouse) arch of aorta GAZ (geography) SerpentineCHEBI (chemistry) serpentine Dictyostelium giant cellFMA giant cell ZFA (zebrafish) blastodermFbbt blastoderm stage PATO (quality) maleChebi (chemical) maleate 2(-) (For anatomy, you may want to remove the mappings that NCBO Bioportal creates for your ontology and/or ask not to allow mapping)
  • Slide 15
  • Zebrafish terms are is_a subtypes of teleost terms is_a Zebrafish Anatomy Teleost Anatomy Ontology Reconciliation and linking between TAO and ZFA Logic implemented via Xrefs- difficult to keep synchronized Xrefs logic can be less clear and more difficult to use
  • Slide 16
  • Synchronization by import across ontologies One can import a whole ontology or just portions of another ontology MIREOT: Minimum information to reference an external ontology term This strategy requires better facilities while editing CARO VAO Present TAOModularized ontology
  • Slide 17
  • OntoFox: a Web Server for MIREOTing Good things: Based on MIREOT principle Web-based data input and output Output OWL file can be directly imported in your ontology No programming needed Programmatically accessible Improvements: Integration into ontology editing tools More customizable http://ontofox.hegroup.org
  • Slide 18
  • We need synchronization solutions that are integrated within ontology editing tools
  • Slide 19
  • What IS the anatomy ontology landscape? How can we efficiently build our anatomy ontologies to be most interoperable? We could have built: A single ontology for ontology editors and consumers Different editors have editing rights to different ontology partitions - by taxon - by domain (e.g. neuroscience, skeletal anatomy) No taxon-specific subtypes - use structure, function etc. as differentia Dynamic views according to user needs
  • Slide 20
  • Ontology landscape model view celltissue muscle tissue mesonephros limb antenna weberian ossicle mammary gland nervous system mollusc foot tentacle mantle pupal DN3 period neuron mushroom body brachial lobe pons vertebra vertebral column circulatory system appendage mesoder m gut tibia gland bone skeletal tissue parietal bone fin gonad trachea respiratory airway link (small sample) tibiafibula larva user/editor view metencephalon neuro view skeletal view mammalian view ventral nerve cord mollusc view neuro view skeletal view
  • Slide 21
  • Proposed model moving forward Maintain series of ontologies at different taxonomic levels - euk, plant, metazoan, vertebrate, mollusc, arthropod, insect, mammal, human, drosophila Each ontology imports/MIREOTs relevant subset of ontology above it - this is recursive Subtypes are only introduced as needed Work together on commonalities at appropriate level above your ontology
  • Slide 22
  • zebrafish caro / uberon/all celltissue metazoa muscle tissue vertebrata mesonephros limb arthropoda antenna teleost weberian ossicle mammalia mammary gland nervous system mollusca foot cephalopod tentacle mantle drosophila neuron types XYZ mushroom body brachial lobe NO pons vertebra vertebral column circulatory system appendage mesoderm gut tibia gland bone skeletal tissue parietal bone fin gonad trachea respiratory airway cross-ontology link (sample) amphibia tibiafibula larva shell cuticle skeleton import mousehuman Model view
  • Slide 23
  • Idealized protocol for new AOs 1.Collect draft list of terms 2.Subdivide roughly into applicability at taxonomic levels 3.Request new terms from existing AOs above you 4.Is a new mid-level AO required? - yes collaborate and create, go to 1. 5.Import pre-reasoned subset from next AO above 6.Build your ontology (David will take it from here in his talk later today)
  • Slide 24
  • Modularizing ontologies- positive reinforcement Identify key points of integration between ontologies Modularize based on domain or taxon Import and reuse rather than cross- referencing or aligning Let the reasoner help do the work Work together to distribute work
  • Slide 25
  • To get the imports working well To have distributed social responsibility assigned Design patterns to ensure we are all doing the same thing To check for consistency and errors across multiple ontologies using reasoners to get correct results for all users -These ontologies are supposed to be orthogonal but arent always Visualization tools that can aid non-ontology experts in identifying errors across multiple ontologies Modularizing ontologies We need:
  • Slide 26
  • Returning to the bottlenecks in our processLooking for solutions Need easy-to-use tools for information capture Ideally based on existing familiar tools Auto-populated from/to ontologies Social management - who is responsible for what Need better import/export functionality: - into/out of ontology editors from simple collection tools - from a myriad of ontology sources Need better interoperability between editors/formats Need enhanced bulk operations Need to know specific requirements for building tools and user feedback Need money and opportunities to interact (like this one!)
  • Slide 27
  • Existing tools for collaborative ontology editing dont quite get us there Google Refine has nice features for manipulating data, including RDF exports, but isnt collaborative Mapping Master for Protg enables generation of OWL from spreadsheets, but is not collaborative and requires ontology knowledge Web Protg isnt fully-fledged and is not useful for non-technical contribution
  • Slide 28
  • Ideas for collaborative ontology editing Extracted from ontology with perl script Need to be edited by domain experts, and then converted back in OWL Need to be merged with existing OWL file Example: File extracted from ontology for this meeting: There is a better way..
  • Slide 29
  • Ideas for using Google Docs Enable creation of Google spreadsheets that curators and domain experts can edit with the following features: Tell Google spreadsheet which columns are which from ontology input file: labels, parents, URIs, xref, class, etc Live-updated with latest external ontology versions using SPARQL Export OBO/ RDF/ OWL serialization Enable search on external ontologies via autocomplete Track changes This will solve some of the sync problems because the queries are executed whenever the doc is open or updated
  • Slide 30
  • Ideas for using Google Docs Enable creation of Google Drawings that curators and domain experts can edit with the following features: Import of external ontologies Have relations and classes exported out from Google Drawing Export OBO/ RDF/ OWL serialization Linked to Google Spreadsheet Track changes
  • Slide 31
  • Ontology editor dreams A truly collaborative web-based editing platform (a la Web Protg) compatible with OWL and OBO Supporting: Import and export of customizable spreadsheets from Google Docs Creation of live templates (spreadsheet in synch with SPARQL endpoints) Supports MIREOT import Users roles and permission Web based versioning