Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
What ontologies exist, who builds them and and what are they used for?
An overview of the ontology landscapeAn overview of the ontology landscape
Robert Stevens, James Malone
Outline
• What do we need to describe?
• What exists to describe it?
• Are they any good….?
• Ontology organisations
Dimensions of description
• The entities themselves – genes proteins, processes, cells, properties
• The investigations that produced the entities
• The informational origins and history of those entities and their descriptions (data and provenance)
What entities exist to be described
• The actual “concrete” biological entities themselves: Proteins, genes, small molecules, cells, gross anatomy, etc etc
• The devices used to produce and measure them
• Properties of those entities: Size, Shape, colour, function, role, etc etc.
• The biological processes in which those biological entities take part.
• The measuring and analytical processes used on those biological entities.
• Sites on those biological entities: Shoulder region, a bit of the environment, the dorsal region of a mouse, etc etc.
• Information artefacts about all of the above: sequences, database records, who, what, when, where and how… lab protocols, etc etc.
Dividing things up from the top
Dividing things up from the top - process
Gene ontology (GO)biological process,Gene ontology (GO) molecular process
Dividing things up from the top - information
Information Artifact Ontology (IAO)Software Ontology (SWO)Unit Ontology (UO)
Dividing things up from the top - material
ChEBIProtein Ontology (PrO)Sequence Ontology (SO)Cell Type Ontology (CLO)Uberon Foundational Model of Anatomy (FMA)NCBI Taxonomy
Dividing things up from the top - property
GO Molecular FunctionPhenotypic Quality (PaTO)Human Disease Ontology (HDO)
Dividing things up from the top - site
Gazetteer Ontology (GAZ)
We’ve covered most of what there is…
• We’ve chosen bits from a simple upper level ontology
• These are domain neutral descriptions of the entities in any domain of interest
• Top-level or upper ontologies give a common view on what discriminations to make…
• … and what relationships to use between them
• BFO, Simple top Bio
Ontologies in these dimensions
• Here we want a “space” covering these dimensions with ontologies splattered about
• Dimension 1: genotype to phenotype
• Dimension 2: investigations
• Dimension 3: information – IAO, prov, etc.
Reference vs Application Ontologies
• Ontologies developed for different uses
• Reference ontologies built with aim of becoming authority on given domain
• Application ontologies built towards specific application use cases, such as for tooling or database needs
• Application ontologies often consume reference ontologies
Things we describe in Biology - Genes
• Gene Ontology - Gene biological processes, cellular components and molecular functions
• Seen as benchmark of success in bio-ontology
• Many ‘best practices’ fallen out of the GO’s development such as evidence codes, obsolescence policy and community development
Things we describe in Biology - Phenotypes• PATO – ‘phenotypic qualities’, i.e. physical properties of
organisms
• Extremely wide range of classes, examples include colour, size, shape, odour, behaviour
• Phenotypes are important in understanding how genes interact with the environment (in producing phenotypes)
Matzke MA, Image: Matzke AJM (2004) Planting the Seeds of a New Paradigm. PLoS Biol 2(5): e133
Master headline
Things we describe in Biology - Disease
• Majority of biomedical studies consider disease in some way
• Multiple terminologies for disease on biology
• SNOMED CT – Medical (clinical) terminology
• ICD-10 – Classification of disease and health problems
• NCI Thesaurus (not an ontology) - large, lots of textual definitions but less axiomatisation, disease subpart
• UMLS – set of controlled vocabularies describing medical concepts very large at >1 million biomedical concepts
• Human Disease Ontology – based on subset of UMLS, enriched with relationships and new concepts
Master headline
Things we describe in Biology - Anatomy
• Anatomy is important for many reasons including:
• Understanding how genes relate to anatomical regions
• Understanding how disease affects anatomical systems
• Comparative anatomy, i.e. comparing how structures in different species are related
• Model organism anatomies, e.g.
• Mouse adult gross anatomy
• Human anatomy – FMA
• Drosophila Anatomy
• Arabidopsis thaliana
• Zebrafish
• C. elegans
Master headline
Genes at work in different species anatomy • DII gene orthologs implicated in development in multiple
species of different anatomical parts
Mungall, C. et al (2012) Uberon, an integrative multi-species anatomy ontology. Genome Biology 2012, 13:R5
Things we describe in Biology – Chemical Entities
• ChEBI - molecular entities focused on ‘small’ chemical compounds
• Janna will talk about this tomorrow
Things we describe in Biology – Cells
• Cell Ontology is an ontology of cell types
• CL merges information contained in species-specific anatomical ontologies as well as referencing ontologies such:
• the Protein Ontology (PR) for uniquely expressed biomarkers
• Gene Ontology (GO) for the biological processes a cell type participates in.
Things we describe in Biology – Pathways
• Reactome is a database of pathways
• Has export to BioPax ontology to describe pathway elements
• Connects many biological concepts including nucleic acids, genes, disease and GO terms
OBO Foundry• OBO = Open Biomedical Ontology
• The OBO Foundry seeks to organise human expertly curated ontologies in biomedicine
• Provides a set of principles for best practice
• Six OBO Foundry ontologies
• OBO library much bigger and there are many Foundry candidate ontologies
• Intrinsically, biology is interconnected yet many ontologies are not formally linked
• Ontology development is expensive – reducing overlap and improving collaboration would decrease this
• Modularity of domains would increase reusability
Let 100 flowers bloom vs Centralised collaboration• 100 flowers bloom:
• Competition driven
• Application and data driven (often to local use cases)
• Requires no commitment to upper ontology framework
• Mapping between efforts can be costly (potentially exponential)
• Duplication of effort
• Centralised collaboration:
• Encourages collaboration and openness
• Aim to produce consensus model of domain knowledge
• Reducing overlap reduces duplicated effort
• Interoperability part of methodology
• Requires upper ontology commitment
• Development by committee can be inhibiting