21
Building and Using Ontologies Dr. Robert Stevens Department of Computer Science University of Manchester [email protected]. uk

Building and Using Ontologies

Embed Size (px)

DESCRIPTION

Building and Using Ontologies. Dr. Robert Stevens Department of Computer Science University of Manchester [email protected]. Introduction. Knowledge & metadata The nature of bioinformatics resources A shared understanding Terminologies and ontologies Building an ontology - PowerPoint PPT Presentation

Citation preview

Page 1: Building and Using Ontologies

Building and Using Ontologies

Dr. Robert Stevens

Department of Computer Science

University of Manchester

[email protected]

Page 2: Building and Using Ontologies

Introduction

• Knowledge & metadata

• The nature of bioinformatics resources

• A shared understanding

• Terminologies and ontologies

• Building an ontology

• Using an ontology

Page 3: Building and Using Ontologies

What is Knowledge?• Knowledge – all

information and an understanding to carry out tasks and to infer new information

• Information -- data equipped with meaning

• Data -- un-interpreted signals that reach our senses

Michael AshburnerProfessor

University of CambridgeUK

ISMB

NameJob

InstitutionCountry

Conf

manacademic, senior

ancient university, 5 ratedEuropean

important figure in biology

BIOLOGY

Page 4: Building and Using Ontologies

What is Metadata?

• Metadata is data about data (information about information)

• A schema is a DBs metadata; as is the administrator's name; the creator, date of creation, documentation

• The label on an Ependorf tube in a freezer is metadata

• A DBs entry’s annotation is metadata on the sequence data

Page 5: Building and Using Ontologies

Syntax & Semantics

• Infix 2 + 3 = 5

• Prefix = + 2 3 5

• Postfix 2 3 + 5 =

• Binary 010 + 011 = 101

• Roman II + III = V

• 7 + 3 = 42

Page 6: Building and Using Ontologies

Types of Semantics

– An operational semantics for a language is defined by what a sentence in that language will do.

– Denotational semantics is a precise mathematical definition of the objects and relations of language in which each sentence of the language names, or denotes, a mathematical object, such as a function.

– Natural semantics are the loose ordinary language sense, in which the semantics of a statement is its "meaning".

– The term logistic semantics refers to formal models that attempt to represent the natural semantics of some external domain.

Page 7: Building and Using Ontologies

Knowledge in Bioinformatics

Page 8: Building and Using Ontologies

A Shared Understanding

• Synonyms and homonyms are rife• Need to know that terms in one resource mean

the same in another resource• Means comparisons are much easier: Can ask

questions over many resources• Structure enables discovery and query

abstractions• Useful for both humans and computers• The Gene Ontology allows queries outside one

model organism

Page 9: Building and Using Ontologies

London Bills of Mortality

Page 10: Building and Using Ontologies

Aggregated Stats

Page 11: Building and Using Ontologies

What is an Ontology?• A means of capturing knowledge in a

computationally amenable form• A shared understanding for humans

and computers• A set of vocabulary terms that

represents a community’s understanding of a domain

• A set of definitions for those terms• The relations between those terms• A formal semantics • A conceptual model whose labels

provide a vocabulary

Nucleic acid

DNARNA

tRNArRNA

Ribosome

Page 12: Building and Using Ontologies

The art of ranking things in genera and species is of no small importance and very much assists our judgment as well as our memory. You know how much it matters in botany, not to mention animals and other substances, or again moral and notional entities as some call them. Order largely depends on it, and many good authors write in such a way that their whole account could be divided and subdivided according to a procedure related to genera and species. This helps one not merely to retain things, but also to find them. And those who have laid out all sorts of notions under certain headings or categories have done something very useful.

Gottfried Wilhelm Leibniz, New Essays on Human Understanding

Page 13: Building and Using Ontologies

Components of an Ontology: Concepts

• Concepts: A unit of thought– AKA: Class, Set, Type, Predicate– Gene, Reaction, Macromolecule

• Terms are labels of concepts• Taxonomy of concepts

– Generalization ordering among concepts

– Concept A is a parent of concept B iff every instance of B is also an instance of A

– Superset / subset– “A kind of” vs. “a part of”

Nucleic acid

DNARNA

tRNArRNA

Ribosome

Page 14: Building and Using Ontologies

Components of an Ontology: Relations

• Relations and Attributes– AKA: Slots, properties, roles– Product of Gene, Map-Position of Gene– Reactants of Reaction, Keq of Reaction

• Meta information about relations– Cardinality, optionality, type restrictions on filler– Transitive, symmetric, functional role properties– Role hierarchies Slot: Expresses Range: Polypeptide or RNA Domain: Genes Cardinality: At-least-1

• General Axioms (constraints)– Nucleic acids < 20 residues are oligonucleiotides

Page 15: Building and Using Ontologies

Gene Ontology http://www.geneontology.org

“a dynamic controlled vocabulary that can be applied to all eukaryotes”

Built by the community for the community.

Three organising principles: Molecular function, Biological

process, Cellular component Is-a and Part of taxonomy ~15,000 concepts

Page 16: Building and Using Ontologies

Components of an Ontology: Instances

• Instances– AKA: objects, individuals, set members– trpA Gene, Reaction 1.1.2.4, Death-receptor-3– Strictly speaking, an ontology with instances is a

knowledge base – The distinction between an instance and a concept

is difficult. – Lard-binding-proteins are all those that bind

Death-receptor-3.

Page 17: Building and Using Ontologies

Components of an Ontology: Properties

• Primitive: properties are necessary– Globular protein must have hydrophobic core, but

a protein with a hydrophobic core need not be a globular protein

• Defined: properties are necessary + sufficient– Eukaryotic cells must have a nucleus. Every cell

that contains a nucleus must be Eukaryotic.

Page 18: Building and Using Ontologies

An Ontology Building Life-cycleIdentify purpose and scope

Knowledge acquisition

Evaluation

Language and representation

Available development tools

Conceptualisation

Integrating existing ontologiesEncoding

Building

Ontology Learning

Consistency Checking

Page 19: Building and Using Ontologies

How to do it

• Collect terms: MacroMolecule, Protein, Enzyme, Holoprotein, Holoenzyme.

• Arrange into a Polyhierarchy (by hand)• Write a definition for each term• Encode in some representation• Carry on• Test against scope, requirements and

competency questions

Page 20: Building and Using Ontologies

How to do it

• Enzyme: is-a MacroMolecule polymerOf AminoAcid Catalyses Reaction• HoloEnzyme: is-a MacroMolecule polymerOf AminoAcid binds ProstheticGroup Catalyses Reaction HoloProtein: is-a MacroMolecule polymerOf AminoAcid binds ProstheticGroup• Protein: is-a MacroMolecule polymer of AminoAcid

Page 21: Building and Using Ontologies

Tips for Building your Terminology

• Choose a narrow ,but useful area• Build using domain experts• Regard computer scientists as a service• You’ll never be complete or correct: Publish early• Be practical: Truth and beauty is a bonus• Be open• A large commitment and a never ending process• Start simple and migrate to expressivity and

“correctness” as you develop• OWL can do this migratory path