1O p en G A L E N
Scale and Context: Issues in Scale and Context: Issues in Ontologies to link Health- and Bio-Ontologies to link Health- and Bio-
InformaticsInformatics Alan Rector, Jeremy Rogers, Alan Rector, Jeremy Rogers,
Angus Roberts, Chris WroeAngus Roberts, Chris Wroe
Bio and Health Informatics Forum/Bio and Health Informatics Forum/Medical Informatics GroupMedical Informatics Group
Department of Computer Science, University of ManchesterDepartment of Computer Science, University of Manchester
[email protected]@cs.man.ac.uk
www.cs.man.ac.uk/mig img.man.ac.ukwww.cs.man.ac.uk/mig img.man.ac.ukwww.clinical-escience.orgwww.clinical-escience.org
www.opengalen.orgwww.opengalen.org
2O p en G A L E N
Organisation of TalkOrganisation of Talk
• Informal presentation, motivation & examples
• Intro to logic based ontologies
• How to use logic based ontologies to represent scales and context– Making context modular – normalisation– Recurrent distinctions
• and tests for those distinctions
• Making logic based ontologies usable– Views and Intermediate Representations
• Summary
3O p en G A L E N
Example Problems of ContextExample Problems of Context• Classification by multiple axes
– e.g. Molecular action, physiologic, and pathological effects
• Chloride transport & Cystic fibrosis
• Biological Scope
– eg. Normal/Abnormal, Human/Mouse
• Conceptual view– e.g. the Digital Anatomist Foundational Model of
organs vs Clinical convention – Is the pericardium a part of the heart?
4O p en G A L E N
Basic ApproachBasic Approach
• Separate information into independent modules– Normalise the ontology
• “The truth, the whole truth, and nothing but the truth”
• Add explicit contextual information– Don’t distort the structure
• Add context to it explicitly
5O p en G A L E N
Why use Logic-based Ontologies?Why use Logic-based Ontologies?
because
Knowledge is Fractal!
&Requirements are Diverse
Coherence without Uniformity!
6O p en G A L E N
Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego
hand
extremity
body
acute
chronic
abnormal
normalischaemic
deletion
bacterial
polymorphism
cell
protein
gene
infection
inflammation
Lung
expression
7O p en G A L E N
Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego
“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon causing Increase in Viscosity of Mucus in CysticFibrosis…”
“Hand which isanatomicallynormal”
8O p en G A L E N
Logic based ontologiesLogic based ontologies
• A formalisation of semantic nets, frame systems, and object hierarchies via KL-ONE and KRL
• “is-kind-of” = “implies” (“logical subsumption”)– “Dog is a kind of wolf”
means“All dogs are wolves”
• Modern examples: DAML+OIL /“OWL”?)• Older variants LOOM, CLASSIC, BACK, GRAIL, K-REP, …
9O p en G A L E N
Encrustation
+ involves: MitralValve
Thing
+ feature: pathological
Structure
+ feature: pathological
+ involves: Heart
Logic Based Ontologies: The basicsLogic Based Ontologies: The basics
Thing
Structure
Heart MitralValve EncrustationMitralValve* ALWAYS partOf: Heart
Encrustation* ALWAYS feature: pathological
Feature
pathological red
+ (feature: pathological)
red
+ partOf: Heart
red
+ partOf: Heart
Primitives Descriptions Definitions Reasoning Validating(constraining cross products)
10O p en G A L E N
Bridging Bio and Health Bridging Bio and Health InformaticsInformatics
• Define concepts with ‘pieces’ from different scales and disciplines and then combine them– “Polymorphism which causes defect which causes
disease”
• Use concepts which make context explicit– “ ‘Hand which is anatomically normal’ has five
fingers”“ ‘Normal human prostate’ has three lobes”
• Use different subproperties for different contexts – “Abnormalities of clinical parts of the heart”
11O p en G A L E N
Bridging Scales Bridging Scales with Ontologieswith Ontologies
GenesSpecies
Protein
Function
Disease
Protein coded by(CFTRgene & in humans)
Membrane transport mediated by (Protein coded by (CFTRgene in humans))
Disease caused by (abnormality in (Membrane transport mediated by (Protein coded by (CTFR gene & in humans))))
CFTRGene in humans
12O p en G A L E N
Use composition to express Use composition to express contextcontext
• Normal and abnormalHand isSubdivisionOf some UpperExtremityHand & AnatomicallyNormal hasSubdivision exactly-
5 fingers
• Homologies and OrthologiesThumb of Hand of Human hasFeature Opposable
Thumb of Hand of NonHumanPrimate ¬hasFeature Opposable
13O p en G A L E N
More detailed exampleMore detailed exampleBody
Prostatesome
Bodymammal
Bodymammal
male
Bodyhumanmale
Bodymousemale
=5Prostate
P1 P2 P3 P4 P5
Prostate=3
Lobe
L1 L2 L3
=1
15O p en G A L E N
Represent context and views by Represent context and views by variant propertiesvariant properties
Organ
HeartPericardium
OrganPart
CardiacValve
Disease of part_of Heart
Disease of Pericardium
is_part_of
is_structurally_part_ofis_clinically_part_of
16O p en G A L E N
What we want to avoid:What we want to avoid: combinatorial explosions combinatorial explosions
• The “Exploding Bicycle” From “phrase book” to “dictionary + grammar” – 1980 - ICD-9 (E826) 8 – 1990 - READ-2 (T30..) 81– 1995 - READ-3 87– 1996 - ICD-10 (V10-19 Australian) 587
• V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income
– and meanwhile elsewhere in ICD-10• W65.40 Drowning and submersion while in bath-tub, street
and highway, while engaged in sports activity
• X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities
17O p en G A L E N
The Cost 1: Normalising (untangling) The Cost 1: Normalising (untangling) OntologiesOntologies
StructureFunction
Part-wholeStructure Function
Part-w
hole
18O p en G A L E N
The Cost 1: Normalising (untangling) The Cost 1: Normalising (untangling) OntologiesOntologies
Making each meaning explicit and separateMaking each meaning explicit and separatePhysSubstance Protein ProteinHormone Insulin Enzyme Steroid SteroidHormone Hormone ProteinHormone^ Insulin^ SteroidHormone^ Catalyst Enzyme^
Hormone = Substance & playsRole-HormoneRoleProteinHormone = Protein & playsRole-HormoneRoleSteroidHormone = Steroid & playsRole-HormoneRoleCatalyst = Substance & playsRole CatalystRoleInsulin playsRole HormoneRole
…build it all by combining simple trees
Enzyme ?=? Protein & playsRole-CatalystRole
PhysSubstance Protein ‘ ProteinHormone’ Insulin ‘Enzyme’ Steroid ‘SteroidHormone’ ‘Hormone’ ‘ProteinHormone’ Insulin^ ‘SteroidHormone’ ‘Catalyst’ ‘Enzyme’
… ActionRole PhysiologicRole HormoneRole CatalystRole …
… Substance BodySubstance Protein Insulin Steroid …
19O p en G A L E N
NormalisationNormalisationBuilding ontologies from orthogonal Building ontologies from orthogonal
treestrees
• Each tree is homogeneous and based on subsumption– One prinicple – one of function, structure,
cause,…
• Every primitive has exactly 1 primitive parent– All multiple classification done by the logic
• All self-standing primitives disjoint
20O p en G A L E N
The Cost: 2 – Clean Distinctions & The Cost: 2 – Clean Distinctions & TestsTests
• Repeating patterns within levels– Structures vs Substances– Flavours of part-whole– Part-whole vs containment, connection, branching– Process/Event vs Thing (“Endurant” vs
“Perdurant”)– …
• Repeating patterns across levels– Multiples at one level act as substances at the
next– Substances span levels; structures are specific to
a level
21O p en G A L E N
Repeating Patterns within each Repeating Patterns within each level level
• Structures vs Substances (Discrete vs Mass)– Structures are made of substances
• Organs are made of tissue
– Parts & portions• Structures have parts & subdivisions,…• Substances have portions
– Portions can have proportions & concentrations
22O p en G A L E N
TestsTests
• Structures (Discrete) – Can you count it? Is one part different
from another? Is it made of something(s)?
• Books, organs, ideas, individual cells, organisations, …
• Substance (Mass)– Are all bits the same? Can something be
made of it? Can you talk about “A piece of it”? “A lump of it”? “A stream of it”? …
• Water, sodium, tissue, blood, …
23O p en G A L E N
Repeating Patterns within each Repeating Patterns within each levellevel
• Part-whole vs containment– Parthood is organisational
• The wall is part of the cell; • The cornea is part of the eye
– Containment is physical• The inclusion is contained in the cell• The marrow is contained in the bone
– Often occur together• Nucleus is a part of and contained in the cell• The retina is part of and contained in the eye
24O p en G A L E N
TestsTests
• Parts– If I take the part away, is the whole
incomplete?– If the part is damaged is the whole
damaged?– If I do something to the part do I do
something to the whole?
• Containment– Is the contained thing inside the container?– Is the relationship spatial/physical?
(or temporal?)
25O p en G A L E N
Repeating Patterns bridging Repeating Patterns bridging levelslevels
• Multiples of structures at one level behave as substances at the next– “Blood is made of in part a multiple of red cells”
“Tissue is made of in part a multiple of cells”“A rash is a multiple of spots”“Polyposis is a multiple of polyps”“A flock is a multiple of birds”
• Multiples are not Sets– Not defined by members
• Membership can change (intensional rather than extensional)
– Action on the singleton is not action on the multiple;Action on the whole is (usually) action on the singletons
• If I treat a spot, I do not treat the rash• If I treat the rash, I treat the spots
26O p en G A L E N
TestsTests
• Multiples– Name for the singleton – “grain”,
“cell”, “bird”?– Singletons are countable?– Multiple is measurable rather than
countable?– Odd to say part-of “This cell is part of
the Arm”?
27O p en G A L E N
But make it simpleBut make it simple
• Intermediate representations and views
– OWL + Detailed Schema is the Assembler Language
• FaCT/SHIQ/… is the machine code
• Almost no one writes in assembler– let alone machine code
• Separate “terms” and “concepts”– Language/labels from concepts
28O p en G A L E N
Tools
Versioning
Language
Metadata
Provenance
Intermed Rep
Links to Resources
Layered Layered ArchitectureArchitecture
Indexed KB
(Frame Like)
DL
Protégé +Protégé +“OilEd-II”+ “OilEd-II”+ …?…?
29O p en G A L E N
Example:Example:An Intermediate Representation for An Intermediate Representation for
SurgerySurgery"Open fixation of a fracture of the
neck of the left femur"
MAIN fixingACTS_ON fracture
HAS_LOCATION neck of long bone
IS_PART_OF femurHAS_LATERALITY
leftHAS_APPROACH open
30O p en G A L E N
The formal “assembler” versionThe formal “assembler” version
hasSpecificSubprocess (‘SurgicalAccessing’
hasSurgicalOpenClosedness (SurgicalOpenClosedness which
hasAbsoluteState surgicallyOpen))
(‘SurgicalProcess’ whichisMainlyCharacterisedBy (performance which
isEnactmentOf (‘SurgicalFixing’ which
actsSpecificallyOn (PathologicalBodyStructure which <involves Bone hasUniqueAssociatedProcess FracturingProcess
hasSpecificLocation (Collum which
isSpecificSolidDivisionOf (Femur which
hasLeftRightSelector leftSelection))>))))
31O p en G A L E N
ResultResult• Training time: 3 mo 3 days +
3 days
• Productivity: 25/day 100/day
• Central reconciliation: 50%+ 10%
• Local cycle time: 3 months <1 week
• “Dependencies” High Low
• Author satisfaction: Low High
• Disputes: Frequent Rare
• Repeatability: Low HighEven Pre Web!Even Pre Web!
32O p en G A L E N
Navigation vs Retrieval/ReferenceNavigation vs Retrieval/Reference“Access terminology” & “Reference terminology”“Access terminology” & “Reference terminology”
• Access follows model of use– e.g. MeSH, MEDCin
• Hierarchy is what is needed next “to hand”– People find easy; Software hard
• Retrieval follows model of meaning– Logic based ontologies
• Hierarchy means “is-kind-of” / subsumption– People may find odd; Software is easy
• Need Both - & visualisations of both– The logic based structure isn’t enough
• Views and intermediate representations
33O p en G A L E N
What’s in a View/ What’s in a View/ Intermediate Representation?Intermediate Representation?
Explicit Context in Ontology “Assembler”
User Oriented Structures
Language
semantictransformations &
Filters
linguisticgeneration &
search
34O p en G A L E N
SummarySummaryLet the logic engine do the workLet the logic engine do the work
• Logic based ontologies can bridge granularities & represent context explicitly– And manage the potential combinatorial
explosions
• To do so– Views and Interface – usable, flexible & easy to
learn• Entry, Navigation, & Use are different
– Structure – explicit & modular – “Normalised”– Conception – clean testable distinctions– Tools & Architecture - layered & comprehensive
• The logic is the assembly language
37O p en G A L E N
Some Healthcare TerminologiesSome Healthcare Terminologies
• ICD 9/10• Traditional paper thesauri• -CM versions essential for billing (and –AM)
• CPT – Clinical Procedure Terminology• “Simple” list
• Clinical Terms (Read Codes) V2• Simple hierarchy• Still dominant in UK general practice
• SNOMED-CT• At least “logic assisted”• Political questions…
• NCI Cancer Ontology• “Logic based in parts” – work in progress
38O p en G A L E N
OthersOthers• Standards Related
– Loinc – laboratory data– Increasingly structured – “logic assisted” aspirations
– HL7 Vocabulary TC– Specialised vocabularies – Inspiration for OHT– Links to RxNorm
– Snomed Dicom Microglossary (SDM)– Image related information – not related tNOMED
• Open Source– OpenGALEN Common Reference Model
• Logic based – multilingual – a resource rather than a terminology
– Basis of UK Drug Ontology
– Open Health Terminology• Watch this space
– Focusing on UMLS– Likely to be at least “logic assisted”
39O p en G A L E N
Special PurposeSpecial Purpose
• Anatomy– Digital Anatomist Foundational Model of
AnatomyFMA
• Principled frame based representation– Superb reference point for structural anatomy
» Needs functional and clinical supplements– http://sig.biostr.washington.edu/projects/da/
• Drugs– RxNorm and VA projects
– See Steve Brown & Stuart Nelson
– UK Primary Care Drug DictionaryUKCPRS (Secondary Care)Drug Ontology (OpenGALEN based)
– MEDDRA, FDA, Proprietary, …, …, …
40O p en G A L E N
Unified Medical Language System Unified Medical Language System (UMLS)(UMLS)
• Common reference point and link to MeSH Terms and literature– De facto standard for universal identifiers
• Concept Unique Identifiers (CUIs)• Lexical Unique Identifiers (LUIs)• String Unique Identifiers (SUIs)
– Valuable in itself:Huge resource for mining and restructuring
• Udo Hahn and Stefan Schulz“CoMMeT – Conceptual Model of Medical Terminology
– http://www.coling.uni-freiburg.de/pub/schulz/commet/
• Alexa McCray is speaking next