Olivier BodenreiderOlivier Bodenreider
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland Bethesda, Maryland -- USAUSA
Biomedical Knowledge Visualization
Bethesda, MD July 6, 2004
7th International Protégé Conference2nd Workshop on Visualizing Informationin Knowledge Engineering (VIKE’04)
UMLS Semantic Navigator SemNav
http://umlsks.nlm.nih.gov*
SN Resources Semantic Navigator(* free UMLS registration required)
3Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
UUnified nified MMedical edical LLanguage anguage SSystemystem®®
◆◆ Developed at NLM since 1990Developed at NLM since 1990
◆◆ 1515thth edition in 2004edition in 2004
◆◆ Integrates some 60 terminological resourcesIntegrates some 60 terminological resources●● Clinical vocabularies (including specialties)Clinical vocabularies (including specialties)
●● Core terminologies (anatomy, drugs, med. devices)Core terminologies (anatomy, drugs, med. devices)
●● Administrative terminologies, standardsAdministrative terminologies, standards
◆◆ IntegrationIntegration●● Synonymous terms are clustered in a conceptSynonymous terms are clustered in a concept
●● Hierarchies (trees) are combined in a graph structureHierarchies (trees) are combined in a graph structure
4Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Terminology integration Terminology integration TermsTerms
Duchenne muscular dystrophy
MeSH, SNOMEDCTV3, Jablonski,CRISP, DxPlain,MedDRA, LOINC
pseudohypertrophic muscular dystrophyMeSH, CTV3SNOMED
X-liked recessive muscular dystrophy Jablonski
Duchenne de Boulogne muscular dystrophy Jablonski
Duchenne’s muscular dystrophy COSTAR
severe generalized familial muscular dystrophy SNOMED
Duchenne type progressive muscular dystrophy SNOMED
5Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Terminology integration Terminology integration RelationshipsRelationships
UMLS
Adrenal Cortex Diseases
Hypoadrenalism
Adrenal Gland Hypofunction
Adrenal cortical hypofunction
Adrenal Gland Diseases
Addison’s Disease
SNOMEDMeSHAODRead Codes
6Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
UMLSUMLS
◆◆ TwoTwo--level structurelevel structure●● Semantic NetworkSemantic Network
■■ 135 Semantic Types (135 Semantic Types (STsSTs))
■■ 54 types of relationships54 types of relationshipsamong among STsSTs
●● MetathesaurusMetathesaurus■■ >1M concepts>1M concepts
■■ ~12 M inter~12 M inter--conceptconceptrelationshipsrelationships
●● Link = categorizationLink = categorizationConcept
Metathesaurus
SemanticType
Semantic Network
categorization
Heart
Concepts
Metathesaurus
22
225
97
4
12
9 31
Esophagus
Left PhrenicNerve
HeartValves
FetalHeart
Medias-tinum
SaccularViscus
AnginaPectoris
CardiotonicAgents
TissueDonors
AnatomicalStructure
Fully FormedAnatomicalStructure
EmbryonicStructure
Body Part, Organ orOrgan Component Pharmacologic
Substance
Disease orSyndrome
PopulationGroup
Semantic Types
SemanticNetwork
MeSH Browser
12Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
SemNavSemNav Visualization optionsVisualization options
17Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
SemNavSemNav RelationshipsRelationships
Dystrophin
Concepts
Semantic Types
MuscularDystrophy,Duchenne55
Amino Acid,Peptide or Protein
Disease orSyndrome
Biologically ActiveSubstance
Gene Ontology browser
http://mor.nlm.nih.gov/perl/gennav.pl
19Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Gene OntologyGene Ontology™™
◆◆ Developed by the GO ConsortiumDeveloped by the GO Consortium
◆◆ Several components (GO database)Several components (GO database)●● Ontology (~17,000 concepts)Ontology (~17,000 concepts)
■■ Molecular functionsMolecular functions
■■ Cellular componentsCellular components
■■ Biological processesBiological processes
●● Gene products (~1.6M)Gene products (~1.6M)
●● Associations between Gene products and GO concepts Associations between Gene products and GO concepts (~6.8M)(~6.8M)
Material and Methods
Technical details
26Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Technical detailsTechnical details
◆◆ Simple web/Simple web/cgicgi technology (apache, Perl)technology (apache, Perl)
◆◆ dot (dot (GraphVizGraphViz))●● PNG file (PNG file (--TpngTpng))
●● ClientClient--side map (side map (--TcmapTcmap))
◆◆ PrecomputePrecomputethe transitive closure on hierarchical the transitive closure on hierarchical relations to perform the transitive closure fastrelations to perform the transitive closure fast
◆◆ Remove cycles (UMLS)Remove cycles (UMLS)
Discussion Issues and Challenges
28Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
IssuesIssues
◆◆ SizeSize●● Large number of concepts (>1 million)Large number of concepts (>1 million)
◆◆ ComplexityComplexity●● PolyhierarchicalPolyhierarchicalstructuresstructures
●● Multiple information sourcesMultiple information sources
●● Multiple propertiesMultiple properties
◆◆ Lack of formalityLack of formality●● Redundant relationsRedundant relations
●● Hierarchies vs. hierarchical relationsHierarchies vs. hierarchical relations
29Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
ChallengesChallenges
◆◆ Restrict information spaceRestrict information space●● To selected information sources (To selected information sources (SemNavSemNav))
●● To selected organisms (To selected organisms (GenNavGenNav))
◆◆ Reduce complexity (Reduce complexity (SemNavSemNav))●● Group concepts by semantic groupsGroup concepts by semantic groups
●● Transitive reduction on hierarchical relationsTransitive reduction on hierarchical relations
●● Select coSelect co--occurring conceptsoccurring concepts
◆◆ Reduce the cognitive burden on the userReduce the cognitive burden on the user●● Use graphUse graph--based rather than treebased rather than tree--based representationsbased representations
30Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
SemNavSemNav Semantic groupsSemantic groups
31Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
ChallengesChallenges
◆◆ Restrict information spaceRestrict information space●● To selected information sources (To selected information sources (SemNavSemNav))
●● To selected organisms (To selected organisms (GenNavGenNav))
◆◆ Reduce complexity (Reduce complexity (SemNavSemNav))●● Group concepts by semantic groupsGroup concepts by semantic groups
●● Transitive reduction on hierarchical relationsTransitive reduction on hierarchical relations
●● Select coSelect co--occurring conceptsoccurring concepts
◆◆ Reduce the cognitive burden on the userReduce the cognitive burden on the user●● Use graphUse graph--based rather than treebased rather than tree--based representationsbased representations
32Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
SemNavSemNav Transitive reductionTransitive reduction
33Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
ChallengesChallenges
◆◆ Restrict information spaceRestrict information space●● To selected information sources (To selected information sources (SemNavSemNav))
●● To selected organisms (To selected organisms (GenNavGenNav))
◆◆ Reduce complexity (Reduce complexity (SemNavSemNav))●● Group concepts by semantic groupsGroup concepts by semantic groups
●● Transitive reduction on hierarchical relationsTransitive reduction on hierarchical relations
●● Select coSelect co--occurring conceptsoccurring concepts
◆◆ Reduce the cognitive burden on the userReduce the cognitive burden on the user●● Use graphUse graph--based rather than treebased rather than tree--based representationsbased representations
MedicalOntologyResearch
Olivier BodenreiderOlivier Bodenreider
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland Bethesda, Maryland -- USAUSA
Contact:Contact:Web:Web:
[email protected]@nlm.nih.govmor.nlm.nih.govmor.nlm.nih.gov