Upload
brandon-gray
View
42
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Building and Using Ontologies. Dr. Robert Stevens Department of Computer Science University of Manchester [email protected]. Introduction. Knowledge & metadata The nature of bioinformatics resources A shared understanding Terminologies and ontologies Building an ontology - PowerPoint PPT Presentation
Citation preview
Building and Using Ontologies
Dr. Robert Stevens
Department of Computer Science
University of Manchester
Introduction
• Knowledge & metadata
• The nature of bioinformatics resources
• A shared understanding
• Terminologies and ontologies
• Building an ontology
• Using an ontology
What is Knowledge?• Knowledge – all
information and an understanding to carry out tasks and to infer new information
• Information -- data equipped with meaning
• Data -- un-interpreted signals that reach our senses
Michael AshburnerProfessor
University of CambridgeUK
ISMB
NameJob
InstitutionCountry
Conf
manacademic, senior
ancient university, 5 ratedEuropean
important figure in biology
BIOLOGY
What is Metadata?
• Metadata is data about data (information about information)
• A schema is a DBs metadata; as is the administrator's name; the creator, date of creation, documentation
• The label on an Ependorf tube in a freezer is metadata
• A DBs entry’s annotation is metadata on the sequence data
Syntax & Semantics
• Infix 2 + 3 = 5
• Prefix = + 2 3 5
• Postfix 2 3 + 5 =
• Binary 010 + 011 = 101
• Roman II + III = V
• 7 + 3 = 42
Types of Semantics
– An operational semantics for a language is defined by what a sentence in that language will do.
– Denotational semantics is a precise mathematical definition of the objects and relations of language in which each sentence of the language names, or denotes, a mathematical object, such as a function.
– Natural semantics are the loose ordinary language sense, in which the semantics of a statement is its "meaning".
– The term logistic semantics refers to formal models that attempt to represent the natural semantics of some external domain.
Knowledge in Bioinformatics
A Shared Understanding
• Synonyms and homonyms are rife• Need to know that terms in one resource mean
the same in another resource• Means comparisons are much easier: Can ask
questions over many resources• Structure enables discovery and query
abstractions• Useful for both humans and computers• The Gene Ontology allows queries outside one
model organism
London Bills of Mortality
Aggregated Stats
What is an Ontology?• A means of capturing knowledge in a
computationally amenable form• A shared understanding for humans
and computers• A set of vocabulary terms that
represents a community’s understanding of a domain
• A set of definitions for those terms• The relations between those terms• A formal semantics • A conceptual model whose labels
provide a vocabulary
Nucleic acid
DNARNA
tRNArRNA
Ribosome
The art of ranking things in genera and species is of no small importance and very much assists our judgment as well as our memory. You know how much it matters in botany, not to mention animals and other substances, or again moral and notional entities as some call them. Order largely depends on it, and many good authors write in such a way that their whole account could be divided and subdivided according to a procedure related to genera and species. This helps one not merely to retain things, but also to find them. And those who have laid out all sorts of notions under certain headings or categories have done something very useful.
Gottfried Wilhelm Leibniz, New Essays on Human Understanding
Components of an Ontology: Concepts
• Concepts: A unit of thought– AKA: Class, Set, Type, Predicate– Gene, Reaction, Macromolecule
• Terms are labels of concepts• Taxonomy of concepts
– Generalization ordering among concepts
– Concept A is a parent of concept B iff every instance of B is also an instance of A
– Superset / subset– “A kind of” vs. “a part of”
Nucleic acid
DNARNA
tRNArRNA
Ribosome
Components of an Ontology: Relations
• Relations and Attributes– AKA: Slots, properties, roles– Product of Gene, Map-Position of Gene– Reactants of Reaction, Keq of Reaction
• Meta information about relations– Cardinality, optionality, type restrictions on filler– Transitive, symmetric, functional role properties– Role hierarchies Slot: Expresses Range: Polypeptide or RNA Domain: Genes Cardinality: At-least-1
• General Axioms (constraints)– Nucleic acids < 20 residues are oligonucleiotides
Gene Ontology http://www.geneontology.org
“a dynamic controlled vocabulary that can be applied to all eukaryotes”
Built by the community for the community.
Three organising principles: Molecular function, Biological
process, Cellular component Is-a and Part of taxonomy ~15,000 concepts
Components of an Ontology: Instances
• Instances– AKA: objects, individuals, set members– trpA Gene, Reaction 1.1.2.4, Death-receptor-3– Strictly speaking, an ontology with instances is a
knowledge base – The distinction between an instance and a concept
is difficult. – Lard-binding-proteins are all those that bind
Death-receptor-3.
Components of an Ontology: Properties
• Primitive: properties are necessary– Globular protein must have hydrophobic core, but
a protein with a hydrophobic core need not be a globular protein
• Defined: properties are necessary + sufficient– Eukaryotic cells must have a nucleus. Every cell
that contains a nucleus must be Eukaryotic.
An Ontology Building Life-cycleIdentify purpose and scope
Knowledge acquisition
Evaluation
Language and representation
Available development tools
Conceptualisation
Integrating existing ontologiesEncoding
Building
Ontology Learning
Consistency Checking
How to do it
• Collect terms: MacroMolecule, Protein, Enzyme, Holoprotein, Holoenzyme.
• Arrange into a Polyhierarchy (by hand)• Write a definition for each term• Encode in some representation• Carry on• Test against scope, requirements and
competency questions
How to do it
• Enzyme: is-a MacroMolecule polymerOf AminoAcid Catalyses Reaction• HoloEnzyme: is-a MacroMolecule polymerOf AminoAcid binds ProstheticGroup Catalyses Reaction HoloProtein: is-a MacroMolecule polymerOf AminoAcid binds ProstheticGroup• Protein: is-a MacroMolecule polymer of AminoAcid
Tips for Building your Terminology
• Choose a narrow ,but useful area• Build using domain experts• Regard computer scientists as a service• You’ll never be complete or correct: Publish early• Be practical: Truth and beauty is a bonus• Be open• A large commitment and a never ending process• Start simple and migrate to expressivity and
“correctness” as you develop• OWL can do this migratory path