175
1 Semantics for Biodiversity Barry Smith http://ontology.buffalo.edu/smith

Semantics for Biodiversity

Embed Size (px)

DESCRIPTION

Semantics for Biodiversity. Barry Smith http://ontology.buffalo.edu/smith. A brief history of the Semantic Web. html demonstrated the power of the Web to allow sharing of information - PowerPoint PPT Presentation

Citation preview

1

Semantics for Biodiversity

Barry Smith

http://ontology.buffalo.edu/smith

A brief history of the Semantic Web

• html demonstrated the power of the Web to allow sharing of information

• can we use semantic technology to create a Web 2.0 which would allow algorithmic reasoning with online information based on XLM, RDF and above all OWL (Web Ontology Language)?

• can we use RDF and OWL to break down silos, and create useful integration of on-line data and information

2/24

people tried, but the more they were successful, they more they failed

OWL breaks down data silos via controlled vocabularies for the description of data dictionaries

Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways

3/24

Ontology success stories, and some reasons for failure

A fragment of the “Linked Open Data” in the biomedical domain

4

What you get with ‘mappings’

all phenotypes (excess hair loss, duck feet)

5

What you get with ‘mappings’

HPO: all phenotypes (excess hair loss, duck feet ...)

NCIT: all organisms

What you get with ‘mappings’

all phenotypes (excess hair loss, duck feet)

all organisms

allose (a form of sugar)

7

What you get with ‘mappings’

all phenotypes (excess hair loss, duck feet)

all organisms

allose (a form of sugar)

Acute Lymphoblastic Leukemia (A.L.L.)

8

Mappings are hard

They are fragile, and expensive to maintainNeed a new authority to maintain, yielding new

risk of forkingThe goal should be to minimize the need for

mappingsInvest resources in disjoint ontology modules

which work well together – reduce need for mappings to minimum possible

9

Why should you care?

• you need to create systems for data mining and text processing which will yield useful digitally coded output

• if the codes you use are constantly in need of ad hoc repair huge resources will be wasted

• relevant data will not be found• serious reasoning will be defeated from the

start

10/24

How to do it right?

• how create an incremental, evolutionary process, where what is good survives, and what is bad fails

• where the number of ontologies needing to be linked is small

• where links are stable• create a scenario in which people will find it

profitable to reuse ontologies, terminologies and coding systems which have been tried and tested

11/24

Uses of ‘ontology’ in PubMed abstracts

12

By far the most successful: GO (Gene Ontology)

13

GO provides a controlled system of terms for use in annotating (describing, tagging) data

• multi-species, multi-disciplinary, open source

• contributing to the cumulativity of scientific results obtained by distinct research communities

• compare use of kilograms, meters, seconds in formulating experimental results

14

Hierarchical view representing relations between represented types 15

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

17

Reasons why GO has been successful

It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists

Based on community consensusUpdated every nightClear versioning principles ensure backwards

compatibility; prior annotations do not lose their value

Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution)

18

GO has learned the lessons of successful cooperation

• Clear documentation• Fully open source (allows thorough testing in

manifold combinations with other ontologies)• Subjected to considerable third-party critique• Rapid turnaround tracker and help desk• Usable also for education • The terms chosen are already familiar

19

natural language labels

to make the data cognitively accessible to human beings

GO has been amazingly successful in overcoming the data balkanization

problembut it covers only generic biological entities of three sorts:

– cellular components– molecular functions– biological processes

and it does not provide representations of diseases, symptoms, …

20

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Original OBO Foundry ontologies (Gene Ontology in yellow) 21

22

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Environment Ontology

envi

ron

men

ts

are

her

e

23

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Population and Community

Ontology (PCO) OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)http://obofoundry.org

24

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

http://obofoundry.org

Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology

and agree in advance to collaborate with developers of ontologies in adjacent domains.

http://obofoundry.org

The OBO Foundry: a step-by-step, evidence-based approach to expand

the GO

25

OBO Foundry Principles

Common governance (coordinating editors)

Common training

Common architecture

• simple shared top level ontology (Basic Formal Ontology)

• shared Relation Ontology: www.obofoundry.org/ro

26

Open Biomedical Ontologies Foundry

Seeks to create high quality, validated terminology modules across all of the life sciences which will be

• close to language use of experts

• evidence-based

• incorporate a strategy for motivating potential developers and users

• revisable as science advances

• modularity: one ontology for each domain

27

28

Modularity

ensures • annotations can be additive• no need for mappings• division of labor amongst domain experts• high value of training in any given module• lessons learned in one module can benefit

work on other modules• incentivization of those responsible for

individual modules

The Modular Approach• Create a small set of plug-and-play ontologies as

stable monohierarchies with a high likelihood of being reused

• Create ontologies incrementally• Reuse existing ontology resources• Use these ontologies incrementally in annotating

heterogeneous data• Annotating = arms length approach; the data and

data-models themselves remain as they are

29

Logical standards can be only part of the solution

OWL … bring benefits primarily on the side of syntax (language)

What we need are standards on the semantics (content) side (via top-level ontologies), including standards for•top-level ontologies•common relations (part_of …)•relation of lower-level ontologies to each other and to the higher levels

120+ ontology projects using BFO

http://www.ifomis.org/bfo/

• Open Biomedical Ontologies Foundry • Ontology for General Medical Science• eagle-I, VIVO, CTSAconnect• AstraZeneca • Elsevier

How a common upper level ontology can help resist ontology chaos

• something to teach• training (expertise) is portable• each new ontology you confront will be more easily

understood at the level of content– and more easily criticized, error-checked

• provides starting-point for domain-ontology development

• provides platform for tool-building and innovations• lessons learned in building and using one ontology

can potentially benefit other ontologies• promote shareability of data across discilinary and

other boundaries

Anatomy Ontology(FMA*, CARO)

Environment

Ontology(EnvO)

Infectious Disease

Ontology(IDO*)

Biological Process

Ontology (GO*)

Cell Ontology

(CL)

CellularComponentOntology

(FMA*, GO*) Phenotypic Quality

Ontology(PaTO)

Subcellular Anatomy Ontology (SAO)Sequence Ontology

(SO*) Molecular Function

(GO*)Protein Ontology(PRO*) OBO Foundry Modular Organization

top level

mid-level

domain level

Information Artifact Ontology

(IAO)

Ontology for Biomedical Investigations

(OBI)

Ontology of General Medical Science

(OGMS)

Basic Formal Ontology (BFO)

38

BFO

A simple top-level ontology to support information integration in scientific research

•No overlap with domain ontologies (organism, person, society, information, …)

•Based on realism

•No abstracta

•Tested in many natural science domains

39

Basic Formal Ontology

Continuant Occurrent

process, eventIndependentContinuant

entity

DependentContinuant

property

property dependson bearer

40

depends_on

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

property event dependson participant

41

Basic Formal Ontology

continuant occurrent

biological processes

independentcontinuant

cellular component

dependentcontinuant

molecular function

roles, qualities

Continuant Occurrent

process, eventIndependentContinuant

DependentContinuant

43

Quality Disposition

instance_of

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

property

.... ..... .......

types

instances 44

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RNAO, PRO)

Molecular Function(GO)

Molecular Process

(GO)

rationale of OBO Foundry coverage

GRANULARITY

RELATION TO TIME

45

Example: The Cell Ontology

Example: Ontology for General Medical Science

47

48http://code.google.com/p/ogms/

coronary heart disease

John’s coronary heart disease

50

CHD in phase of asymptomatic

(‘silent’) infarction

CHD in phase of early lesions

and small fibrous plaques

stable angina

CHD in phase of surface

disruption of plaque

unstable angina

instantiates at t1

instantiates at t2

instantiates at t3

instantiates at t4

instantiates at t5

in nature, no sharp boundaries here

human

John

51

embryo fetus adultneonate infant child

instantiates at t1

instantiates at t2

instantiates at t3

instantiates at t4

instantiates at t5

instantiates at t6

in nature, no sharp boundaries here

A disease is a disposition

etiological process

produces

disorder

bears

disposition

realized_in

pathological process

produces

abnormal bodily features

recognized_as

signs & symptomsinterpretive process

produces

diagnosis

used_in52

Cirrhosis - environmental exposure Etiological process - phenobarbitol-

induced hepatic cell death produces

Disorder - necrotic liver bears

Disposition (disease) - cirrhosis realized_in

Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death produces

Abnormal bodily features recognized_as

Symptoms - fatigue, anorexia Signs - jaundice, splenomegaly

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out cirrhosis suggests

Laboratory tests produces

Test results - elevated liver enzymes in serum used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease cirrhosis

53

Dispositions and Predispositions

Some dispositions are predispositions to other dispositions.

54

HNPCC - genetic pre-disposition

Etiological process - inheritance of a mutant mismatch repair gene produces

Disorder - chromosome 3 with abnormal hMLH1 bears

Disposition (disease) - Lynch syndrome realized_in

Pathological process - abnormal repair of DNA mismatches produces

Disorder - mutations in proto-oncogenes and tumor suppressor genes with microsatellite repeats (e.g. TGF-beta R2) bears

Disposition (disease) - non-polyposis colon cancer realized in

Symptoms (including pain)

55

Ontology modules extending of OGMS

Sleep Domain Ontology (SDO)

Ontology of Medically Relevant Social Entities (OMRSE)

Vital Sign Ontology (VSO)

Mental Disease Ontology (MD)

Neurological Disease Ontology (ND)

Infectious Disease Ontology (IDO)

56

Infectious Disease Ontology (IDO)

– IDO Core: • General terms in the ID domain. • A hub for all IDO extensions.

– IDO Extensions: • Disease specific. • Developed by subject matter experts.

• Provides:– Clear, precise, and consistent natural language

definitions– Computable logical representations (OWL, OBO)

How IDO evolvesIDOCore

IDOSa

IDOHumanSa

IDORatSa

IDOStrep

IDORatStrep

IDOHumanStrep

IDOMRSa

IDOHumanBacterial

IDOAntibioticResistant

IDOMAL IDOHIVCORE and SPOKES:Domain ontologies

SEMI-LATTICE:By subject matter experts in different communities of interest.

IDOFLU

IDO Core

• Contains general terms in the ID domain:– E.g., ‘colonization’, ‘pathogen’, ‘infection’

• A contract between IDO extension ontologies and the datasets that use them.

• Intended to represent information along several dimensions:– biological scale (gene, cell, organ, organism, population)– discipline (clinical, immunological, microbiological) – organisms involved (host, pathogen, and vector types)

Sample IDO Definitions

• Host of Infectious Agent (BFO Role): A role borne by an organism in virtue of the fact that its extended organism contains an infectious agent.

• Extended Organism (OGMS): An object aggregate consisting of an organism and all material entities located within the organism, overlapping the organism, or occupying sites formed in part by the organism.

• Infectious Agent: A pathogen whose pathogenic disposition is an infectious disposition.

IDO and IDOSa

• Scale of the infection (disorder)

from Shetty, Tang, and Andrews, 200912/10/2010 61

Staphylococcus aureus (Sa)

MSSa MRSa

HA-MRSa CA-MRSa

UK CA-MRSa Australian CA-MRSa

Specific Strains

{Antibiotic Resistance

{Pathogenesis Location Type

{Geographic Region

{Various Differentia

Differentiated by:

Sample Application: A lattice of infectious disease application ontologies from NARSA isolate data

Network on Antimicrobial Resistance in Staphylococcus aureus–http://www.narsa.net/content/staphLinks.jsp

True personalized medicine – YourDiseaseOntology

Ways of differentiating Staphylococcus aureus infectious diseases

• Infectious Disease– By host type– By (sub-)species of pathogen– By antibiotic resistance– By anatomical site of infection

• Bacterial Infectious Disease– By PFGE (Strain)– By MLST (Sequence Type)– By BURST (Clonal Complex)

• Sa Infectious Disease– By SCCmec type

• By ccr type• By mec class

– spa type

International Working Group on the Staphylococcal Cassette Chromosome elements

ido.owl

narsa.owl

narsa-isolates.owl

ndf-rt

NRS701’s resistance to clindamycin

Plant IDO

• Virulence• Resistance• Symbiont• …

BFO: The Very Top

continuant

independentcontinuant

dependentcontinuant

qualityfunctionroledisposition

occurrent

Basic Formal Ontology

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality

.... ..... .......

types

instances

Basis of BFO in GO

Continuant Occurrent

biological processIndependent

Continuant

cellular component

DependentContinuant

molecular function

..... ..... ........

How a common upper level ontology can help resist ontology chaos

something to teachtraining (expertise) is portableeach new ontology you confront will be more easily

understood at the level of contentand more easily criticized, error-checked

provides starting-point for domain-ontology development

provides platform for tool-building and innovations• lessons learned in building and using one

ontology can potentially benefit other ontologies• promote shareability of data across discilinary

and other boundaries

71

Entity =def

anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software

(entities on levels 1, 2 and 3)

72

First basic distinction among entities

type vs. instance

(science text vs. diary)

(human being vs. Tom Cruise)

73

For ontologies

it is generalizations that are important = types, types,

kinds, species

74

A 515287 DC3300 Dust Collector Fan

B 521683 Gilmer Belt

C 521682 Motor Drive Belt

Catalog vs. inventory

75

Catalog vs. inventory

Catalog of types/Types

77

types vs. instances

78

names of instances

79

names of types

80

An ontology is a representation of types

We learn about types in reality from looking at the results of scientific experiments in the form of scientific theories

experiments relate to what is particular science describes what is general

siamese

mammal

cat

organism

objecttypes

animal

frog

instances

82

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

3 kinds of (binary) relations

Between types

• human is_a mammal

• human heart part_of human

Between an instance and a type

• this human instance_of the type human

• this human allergic_to the type tamiflu

Between instances

• Mary’s heart part_of Mary

• Mary’s aorta connected_to Mary’s heart83

Type-level relations presuppose the underlying instance-level relations

A is_a B =def. A and B are types and all instances of A are instances of B

A part_of B =def. All instances of A are instance-level-parts-of some instance of B

84

The assertions linking terms in ontologies must hold universally

Hence all type-level relations in RO are provided with

All-Some definitions

If you know A part_of B, and B part_of C then whichever A you

choose, the instance of B of which it is a part will be included in

some C, which will include as part also the A with which you

began

85

86

part_offor continuant classes is

time-indexed

A part_of B =def.given any particular a and any time t, if a is an instance of A at t,then there is some instance b of B such that a is an instance-level part_of b at t

87

C

c at t

C1

c1 at t1

C'

c' at t

derives_from (ovum, sperm zygote ... )

time

instances

88

transformation_of

c at t1

C

c at t

C1

time

same instance

pre-RNA mature RNAchild adult

89

transformation_of

C2 transformation_of C1 =def. any instance

of C2 was at some earlier time an instance

of C1

90

C

c at t c at t1

C1

embryological development

91

C

c at t c at t1

C1

tumor development

92

The Granularity Gulf

most existing data-sources are of fixed, single granularity

many (all?) clinical phenomena cross granularities

93

transformation_of

C

c at t c at t1

C1

94

universality

Often, order will matter:

We can assert

adult transformation_of child

but not

child transforms_into adult

95

Representation =def

an image, idea, map, picture, name or description ... of some entity or entities.

Ontologies are structured representations of the types in a certain domain of reality

96

Ontologies are here

97

or here

98

Ontologies represent general structures in reality (leg)

99

Ontologies do not represent concepts in people’s heads

100

They represent types in reality

A 515287 DC3300 Dust Collector Fan

B 521683 Gilmer Belt

C 521682 Motor Drive Belt

instances

types

102

Inventory vs. Catalog:Two kinds of representational

artifact

Databases represent instances

Ontologies represent types

103

How do we know which general terms designate types?

Types are repeatables:

cell, electron, weapon, mouse ...

Instances are one-off: Bill Clinton, this mouse …

104

Problem

The same general term can be used to refer both to types and to collections of particulars. Consider:

HIV is an infectious retrovirus

HIV is spreading very rapidly through Asia

105

Class =def

a maximal collection of particulars determined by a general term (‘cell’, ‘electron’ but also: ‘ ‘restaurant in Palo Alto’, ‘Italian’)

the class A = the collection of all particulars x for which ‘x is A’ is true

106

types vs. their extensions

types

{a,b,c,...} collections of particulars

107

Extension

=def The extension of a type is the class of its instances

108

types vs. classes

types

{c,d,e,...} classes

109

types vs. classes

types

extensions other sorts of classes

110

types vs. classes

types

populations, ...

the class of all diabetic patients in Leipzig on 4 June 1952

111

OWL is a good representation of classes

• F16s

• sibling of Finnish spy

• member of Abba aged > 50 years

112

types, classes, concepts

types

classes

‘concepts’ ?

113

types < classes < ‘concepts’

Cases of ‘concepts’ which do not correspond to classes:

‘Cancelled manoeuvre’‘Planned manoeuvre’‘Fake terrorist’

Such terms do not represent anythingSee Information Artifact Ontology (IAO)

114

Ontology =def.

a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent

1. types in reality

2. those relations between these types which obtain universally (= for all instances)

lung is_a anatomical structure

lobe of lung part_of lung

115

BFO Top-Level Ontology

ContinuantOccurrent

(always dependent on one or more

independent continuants)

IndependentContinuant

DependentContinuant

116

Two kinds of entities

occurrents (processes, events, happenings)

continuants (objects, qualities, states...)

117

Continuants (aka endurants)have continuous existence in timepreserve their identity through changeexist in toto whenever they exist at all

Occurrents (aka processes)have temporal partsunfold themselves in successive phasesexist only in their phases

118

You are a continuant

Your life is an occurrent

You are 3-dimensional

Your life is 4-dimensional

119

Dependent entities

require independent continuants as their bearers

There is no run without a runner

There is no grin without a cat

120

Dependent vs. independent continuants

Independent continuants (organisms, buildings, environments)

Dependent continuants (quality, shape, role, propensity, function, status, power, right)

121

All occurrents are dependent entities

They are dependent on those independent continuants which are their participants (agents, patients, media ...)

122

BFO Top-Level Ontology

ContinuantOccurrent

(always dependent on one or more

independent continuants)

IndependentContinuant

DependentContinuant

OBO Foundry organized in terms of Basic Formal Ontology

Each Foundry ontology can be seen as an extension of a single upper level ontology (BFO)

either post hoc, as in the case of the GO

or in virtue of creation ab initio via downward population from BFO

123

124

How to build an ontologyimport BFO into ontology editor

work with domain experts to create an initial mid-level classification

find ~50 most commonly used terms corresponding to types in reality

arrange these terms into an informal is_a hierarchy according to this universality principle

A is_a B every instance of A is an instance of B

fill in missing terms to give a complete hierarchy

(leave it to domain experts to populate the lower levels of the hierarchy)

Example: The Cell Ontology

quality: John’s blood glucose level

participates_inOBI process:

this specific assay

inheres_in

John

deviceparticipates_in

part_of screen

has_specified_output

quality: ‘120 mg/dL’-shaped

pattern

IAO:measurement datum

is_about concretized_by

inheres_inportion of blood

derived_from

Numerical Value Example

Quality of portion of blood

elements of an ontological analysis: 1.the portion of blood (material entity)2.the blood sugar level (quality) referred to by means of 3.an expression (information artifact, thus a BFO:generically dependent continuant) ‘100 mg/dL’.

BFO 2.0

BFO 2.0

process: John’s heart beating

has_participantOBI process:

this specific assay

has_participant

John

device

has_participant

has_part screen

has_specified_output quality: ‘120

bpm’-shaped pattern

IAO:measurement datum

is_about concretized_by

inheres_in

Beat Measurement

Process measurement

heart beating at constant rate, elements of an ontological analysis: 1.the heart (object)2.the process of beating3.the temporal region occupied by this process4.the spatiotemporal region that is occupied by this process (trajectory of the beating process)5.the rate, referred to by means of 6.an expression (information artifact, thus a BFO:generically dependent continuant) such as ‘63 beats/minute’.

process: John’s heart beating

has_participant

measurement process:

this specific assay

has_participant

John

device

has_participant

has_partscreen

has_specified_output quality: ‘120

bpm’-shaped pattern

IAO:measurement datum

is_about concretized_by

inheres_in

Beat Measurement

The Information Artifact Ontology

credit card numbers are not integers

names are not strings

serial numbers are not strings

Rather, they are artifacts, human creations.

If my Social Security Number is the same integer as your Credit Card Number, they are still different Numbers

If my name is the same string as your name, they are still different names

134

Information Artifacts in Science

protocoldatabasetheoryontology gene listpublicationresult...

135

Information Entity (labeling)

serial numberbatch numbergrant numberperson numbernameaddressemail addressURL...

136

http://code.google.com/p/information-artifact-ontology/

137

What is a datum?

Continuant Occurrent

processIndependentContinuant

laptop, book

DependentContinuant

quality

.... ..... .......datum: a pattern in some medium with a certain kind of provenance

138

type or instance

ContinuantOccurrent(Process)

IndependentContinuant

human being,protocol document

DependentContinuant

pattern of ink marks

Applying the protocol

Side-Effect …

...... ..... .... .....139

Continuant Occurrent

IndependentContinuant

DependentContinuant

.... ..... .......

InformationEntity

Action

creating a datum

140

Type: human beingInstance: Leon Tolstoy

Type: novelInstance: War and Peace

Type: bookInstance: this copy of War and Peace

types and instances

141

Is War and Peace a type or an instance?If War and Peace were a type, and the copies of War and Peace in my library and in your library were instances, then

• there would be many War(s) and Peaces.

Hence War and Peace is an instance.

What is a work of literature?

142

There can be two copies of the Declaration of Independence

There cannot be two Declarations of Independence

There are not two Declarations of Independence

143

Syntactic rule of thumb for types

Their names are pluralizable

There can be three peopleThere cannot be three Condoleezza Rices

Information Entities = entities which can exist in many perfect copiesYour genome is an information entity, but not an information artifact

144

Specific dependence

Continuant Occurrent

process

IndependentContinuant

thing

DependentContinuant

quality

.... ..... .......headache dependson human being

145

Generically Dependent Continuants

GenericallyDependentContinuant

Information Entity

Sequence

if one bearer ceases to exist, then the entity can survive, because there are other bearers (copyability)

the pdf file on my laptop

the DNA (sequence) in this chromosome

146

are realized through being concretized in specifically dependent continuants(the plan in your head, the protocol being realized by your research team)

Generically dependent continuants

147

Types vs. generically dependent continuants

types have subtypes (kinds): if you can have a kind of something, then it’s a type

you can’t have a kind of Bill Clinton

you can’t have a kind of The Constitution of the United States

148

Generically Dependent Continuants

GenericallyDependentContinuant

Information Entity

Sequence

.pdf file .doc file

instances 149

are concretized in specifically dependent continuants

Beethoven’s 9th Symphony is concretized in the pattern of ink marks which make up this score in my hand

Generically dependent continuants

150

do not require specific media (paper, silicon, neuron …)

Generically dependent continuants

151

Realizable Dependent Continuants

SpecificallyDependentContinuant

Quality, PatternRealizable Dependent Continuant

inert ert

Occurrent

152

Examplesperformance of a symphonyprojection of a filmutterance of a sentenceapplication of a therapycourse of a diseaseincrease of temperature

OccurrentRealizable Dependent Continuant

153

154

Information Content Entity

Geospatial Entity

Entity

Road Intersecti

on

Property

Physical Propert

y

Geospatial Reference

Point

Designative Information

Content Entity

Physical

Location

Key:

Ontology Elements

Relations

Data Elements

is_a

is_a

is_a

is_a

is_a

is_a

is_a

String: Amazai and

Nawagai Sura Road

Intersection

TRP: AB 001

WPT: EZ497

Lat: 34.40393540678018

Long: 72.50272750854492

MGRS: TF 4679 5792

is_a

denotes

denotes

denotes denote

sdenote

s

Ontology

Data Model Elements

has_rolehas_propert

y

BFO: role• a realizable dependent continuant that is not the

consequence of the nature of the independent continuant entity which bears the role (contrast: disposition)

• the role is optional (someone else assigns it, the entity acquires it by moving it into a specific context)

• roles often come in pairs (husband/wife)

155

ContinuantOccurrent

IndependentContinuant

Specifically DependentContinuant

Quality Disposition

Realization

Role

Realizable DependentContinuant

GenericallyDependentContinuant

156

• standard examples: nurse, student, patient; • in each case something holds (that a person plays

a role) because of some socially vehiculated decision. Functions never exist purely because people decide that they exist; this is because functions rest in each case on some underlying physical structure with relevant causal powers.

Roles

157

158

Principle of Low Hanging Fruit

Include even absolutely trivial assertions (assertions you know to be universally true)

pneumococcal bacterium is_a bacterium

Computers need to be led by the hand

159

Principle of singular nouns

Terms in ontologies represent types

Goal: Each term in an ontology should represent exactly one type

Thus every term should be a singular noun

Count vs. mass nouns

Count

suitcase

cow

datum

Mass

luggage

beef

information

160

Principle: Avoid mass nouns

Brenda Tissue Ontology

blood is_a hematopoietic system

hematopoietic system is_a whole body

whole_body is_a animal

161

162

Principle of definitions

Supply definitions for every term

1.human-understandable natural language definition

2.an equivalent formal definition

163

Principle: definitions must be unique

Each term should have exactly one definition

it may have both natural-language and formal versions

(issue with ontologies which exist with different levels of expressivity)

164

The Problem of Circularity

A Person =def. A person with an identity document

Hemolysis =def. The causes of hemolysis

165

Principle of non-circularity

The term defined should not appear in its own definition

166

Principle of increase in understandability

A definition should use only terms which are easier to understand than the term defined

Definitions should not make simple things more difficult than they are

167

Principle of acknowledging primitives

In every ontology some terms and some relations are primitive = they cannot be defined (on pain of infinite regress)

Examples of primitive relations:

identity

instance_of

168

Principle of Aristotelian definitions

Use two-part definitions

An A is a B which C’s.

A human being is an animal which is rational

Here A is the child term, B is its immediate parent in the ontology is_a hierarchy

169

Rules for formulating terms

Avoid abbreviations even when it is clear in context what they mean (‘breast’ for ‘breast tumor’)

Avoid acronymsAvoid mass terms (‘tissue’, ‘brain mapping’,

‘clinical research’ ...)Treat each term ‘A’ in an ontology is

shorthand for a term of the form ‘the type A’

170

universality

Often, order will matter:

We can assert

adult transformation_of child

but not

child transforms_into adult

171

universality

viral pneumonia caused by virus

but not

virus causes pneumonia

pneumococcal virus causes pneumonia

172

Principle of Universality

results analysis later_than protocol-design

but not

protocol-design earlier_than results analysis

173

Principle of positivityComplements of types are not themselves types.

Terms such as

non-mammal non-membrane other metalworker in New Zealand

do not designate types in reality

174

Generalized Anti-Boolean Principle

There are no conjunctive and disjunctive types:

anatomic structure, system, or substance

musculoskeletal and connective tissue disorder

175

Objectivity

Which types exist in reality is not a function of our knowledge.

Terms such as

unknown

unclassified

unlocalized

arthropathies not otherwise specified

do not designate types in reality.

176

Keep Epistemology Separate from Ontology

If you want to say that

We do not know where A’s are located

do not invent a new class of

A’s with unknown locations

(A well-constructed ontology should grow linearly; it should not need to delete classes or relations because of increases in knowledge)

177

If you want to say

I surmise that this is a case of pneumonia

do not invent a new class of surmised pneumonias

Confusion of ‘findings’ in medical terminologies

Keep Sentences Separate from Terms

178

Principle: do not commit the use-mention confusion

mouse =def. common name for the species mus musculus

179

Principle: do not commit the use-mention confusion

Avoid confusing between words and things

Avoid confusing between concepts in our minds and entities in reality

Recommendation: avoid the word ‘concept’ entirely

Species

species = reproductively isolated units that persist as continuants over time.

(one problem area: bacteria, noclear "reproductive isolation" and horizontal gene transfer.)

180

Core and ExtensionsIDO

GBIF -- Germ Plasm Repository Extension of Darwin Core

181