Upload
eunsil-yoon
View
497
Download
2
Tags:
Embed Size (px)
DESCRIPTION
UMLS(unified-medical-language-system)에 대한 정리 및 관련 연구 발표 자료 (의료정보표준 수업에서 발표함)
Citation preview
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
U.S. National Library of MedicineNational Institutes of Health
UMLS(The Unified Medical Language System)
2012.11.29 Reviewed by Eunsil Yoon
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Contents
• Introduction
– What is the UMLS?
– UMLS is Use
– www.nlm.nih.gov/research/umls
• The Three UMLS Tools (Knowledge Sources)
– Metathesaurus
– Semantic network
– SPECIALIST Lexicon
• UMLS in JAMIA papers
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
What is the UMLS?
• Started in 1986 (NLM; National Library of Medicine)
• NLM is a member of the IHTSDO(owner of SNOMED CT)
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
What is the UMLS?
• Unified Medical Language System® (UMLS®)
• A set of files and software that brings together many health
and biomedical vocabularies and standards to enable inter-
operability between computer systems.
• You can use the UMLS to enhance or develop applications,
such as electronic health records, classification tools, dic-
tionaries and language translators.
The UMLS is not an end-user application
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
NLM Mainpage
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS > Metathesaurus browser
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus Browser > Synonyms
Synonyms (246)(Acute nasopharyngitis or rhinitis) or (common cold)(Acute nasopharyngitis or rhinitis) or (common cold)
(disorder)ARNAS IBILBIDE GARAIETAKO ZOLDURA/ HOTZALDI
ARRUNTAAcut nasopharyngitis (meghűlés)Acut rhinitisAcute NasopharyngitisAcute coryzaAcute infectie bovenste luchtwegenAcute infective rhinitisAcute nasal catarrhAcute nasofaryngitis [verkoudheid]Acute nasopharyngitisAcute nasopharyngitis (common cold)Acute nasopharyngitis [common cold]Acute nasopharyngitis, NOSAcute rhinitisAcute rhinitis (disorder)Akute Rhinopharyngitis [Erkaeltungsschnupfen]Akutní nazofaryngitidaAkutní rinitidaAkutní zánět nosohltanu (prosté nachlazení)COLDCOMMON COLDCORIZACORYZA
ПРОСТУДАかぜかぜひきかぜ症候群コリーザ - 急性急性コリーザ急性鼻咽頭炎急性鼻咽頭炎(感冒)急性鼻炎感冒感冒 - 普通感冒症候群感染性鼻炎普通感冒頭部感冒風邪鼻感冒鼻炎(感染性)급성 코인두염 [ 감기 ]カンセンセイビエンカンボウカンボウショウコウグンキュウセイハナイントウエンカンボウキュウセイビイントウエン
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus Browser > Relations
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS > Metathesaurus browser
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS > Semantic Network Browser
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
NLM > UMLS > UTS > Semantic Network Browser
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
The Three UMLS Tools (Knowledge Sources)
• Metathesaurus
• Semantic Network
• SPECIALIST Lexicon
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus
• The Metathesaurus is a large, multi-purpose, and multi-lingual vocabulary
database that contains information about biomedical and health related
concepts, their various names, and the relationships among them.
• Over 100 vocabularies, code sets, and thesauri, or "source
vocabularies" are brought together to create the Metathesaurus.
• organized by meaning and assigned a concept unique identifier (CUI).
• 62% of the Metathesaurus source vocabularies English
• Also contains terms from 17 other languages
Atrial fibrillation ICD-9-CMAF NCI ThesaurusAFib MedDRAAtrial fibrillation (disorder) SNOMED Clinical Termsatrium; fibrillation ICPC2-ICD10 Thesaurus
Ex. “Atrial Fibrillation”
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus Basic organization
• Concepts
– Synonymous terms are clustered into a concept
– Properties are attached to concepts, e.g.,
• Unique identifier
• Definition
• Relations
– Concepts are related to other concepts
– Properties are attached to relations, e.g.,
• Type of relationship
• Source
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus - subsets
• Users create a useful subset, or smaller grouping of con-
cepts, by choosing source vocabularies
• Examples of subsets include
– Source vocabularies in a language (all Spanish vocabularies)
– All terms that are free for use within the United States
– CPT codes to be used for billing purposes
– Terms with the semantic type 'Clinical Drug'
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus – Unique Identifiers
• Concept Unique Identifiers (CUI)
– A concept is a meaning. A meaning can have many different names. A key
goal of Metathesaurus construction is to understand the intended meaning of
each name in each source vocabulary and to link all the names from all of
the source vocabularies that mean the same thing (the synonyms).
• Lexical (term) Unique Identifiers (LUI)
– LUI link strings that are lexical variants. Lexical variants are detected using
the Lexical Variant Generator (LVG) program, one of the UMLS lexical
tools.
• String Unique Identifiers (SUI)
– Each unique concept name or string in each language in the Metathesaurus
has a unique and permanent string identifier (SUI). Any variation in character
set, upper-lower case, or punctuation difference is a separate string, with a
separate SUI. SUI contain the letter S followed by seven numbers. In the ex-
ample on the right there are four strings with four different SUI.
• Atom Unique Identifiers (AUI)
– The basic building blocks or "atoms" from which the Metathesaurus is con-
structed are the concept names or strings from each of the source vocabular-
ies. Every occurrence of a string in each source vocabulary is assigned a
unique atom identifier (AUI).
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus – Unique Identifiers > Atom
obsoletesuppressible
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus – Data Files
• The Metathesaurus consists of forty data, metadata, and index files.
• The data files listed below contain information obtained from the source vocabularies.
• The table below illustrates what information populates each data file.
Metadata File Name Contents
MRCONSO.RRF Names, Synonyms, Terms, Term Types, Codes
MRREL.RRF RelationshipsMRHIER.RRF HierarchiesMRSAT.RRF AttributesMRDEF.RRF DefinitionsMRMAP.RRF MappingsMRSMAP.RRF Simplified MappingsMRSTY.RRF Semantic Types
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Metathesaurus – Data Files > RRF
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network
• The Semantic Network
– Semantic types (high level categories)
– Semantic relationships (relationships between semantic types)
• The Semantic Network can be used to categorize any medical vo-
cabulary.
• 133 semantic types in the Semantic Network
• Every Metathesaurus concept is assigned at least one semantic
type; very few terms are assigned as many as five semantic types.
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network - Type• Entity
• A broad type for grouping physical and
conceptual entities.
• Examples of Entity semantic types are:
• Amphibian
• Gene or Genome
• Carbohydrate
• Event
• A broad type for grouping activities,
processes and states.
• Examples of Event semantic types are:
• Social Behavior
• Laboratory Procedure
• Mental Process
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
AnatomicalAbnormality
SubstanceOrganismManufactured
ObjectAnatomicalStructure
Conceptual entity
Entity
Physical Object
ClinicalDrug
Fully FormedAnatomicalStructure
EmbryonicStructure
ResearchDevice
MedicalDevice
FoodChemicalBody
Substance
Rickettsia orChlamydiaVirusPlantFungusBacteriumArchaeonAnimal
BiologicalActive
SubstanceReptileMammalFishBirdAmphibian Pharmacologic
Substance
Element,Ion, orIsotope
InorganicChemical
OrganicChemical
Hazardous orPoisonousSubstance
BiologicalDental
Material
Indicator,Reagent, or
Diagnostic Aid
Cellcomponent
Body Part Organ, or Organ Component
CongenitalAbnormality
AcquiredAbnormality
InvertebrateVertebrateGene orGenome
TissueCell AlgaChemicalViewed
Structurally
ChemicalViewed
Functionally
VitaminEnzymeHormoneNeuroreactiveSubstance or
Biogenic AmineHuman Immunologic
Factor ReceptorAntibioticAmino Acid,Nucleoside,
or nucleotide
Carbohydrate
LipidNucleic Acid,Nucleoside
,or Nucleotide
Organophosphorus
Compound
SteroidEicosanoid
Semantic Network Physical Object
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
FindingIdea orConcept
Physical ObjectConceptual entity
Occupation orDiscipline
LanguageIntellectual
ProductOrganismAttribute
GroupGroup
AttributeOrganization
Regulationor Law
ClassificationClinical
AttributeSign or
SymptomLaboratory or
Test ResultAmino AcidSequence
BiomedicalOccupation or
Discipline
NucleotideSequence
CarbohydrateSequence
Patient orDisabled
Group
PopulationGroup
Professional orOccupational
GroupFamily GroupAge Group
SpatialConcept
QuantitativeConcept
QualitativeConcept
Temporal Concept
FunctionalConcept
Body SystemMolecular Sequence
GeographicArea
Body Space orJunction
Body Locationor Region
CarbohydrateSequence
Amino AcidSequence
NucleotideSequence
Semantic Network Conceptual Ob-ject Entity
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Event
Behavior
PhenomenonOr ProcessActivity
IndividualBehavior
EducationalActivity
SocialBehavior
Daily orRecreational
Activity
Injury orPoisoning
NaturalPhenomenon
of Process
Human-causedPhenomenon of
Process
MachineActivity
OccupationalActivity
Environmental Effect of
HumanResearchActivity
Health CareActivity
Governmentalor Regulatory
Activity
BiologicFunction
MolecularBiology
ResearchTechnique
Therapeutic orPreventiveProcedure
LaboratoryProcedure
DiagnosticProcedure
PathologicFunction
PhysiologicFunction
Cell orMolecular
DysFunction
OrganismFunction
Organ orTissue
Function
MolecularFunction
CellFunction
ExperimentalModel ofDisease
Diseaseor
Syndrome
Mental orBehavioral
Dysfunction
NeoplasticProcess
MentalProcess
GeneticFunction
Semantic Network - Event
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network - Relationships
• 54 Semantic Relationships
• The primary link between most semantic
types is the ‘isa’ relationship.
• Animal isa Entity
• Carbohydrate isa Chemical
• Human isa Mammal
[ Relation Label ]
isa
part_of
result_of
co-occurs_with
evaluation_of
location_of
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
The Semantic Network - Relationships
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
SPECIALIST Lexicon
• A lexicon is necessarily a core component of any natural language process-
ing system
• Coverage includes both commonly occurring English words and biomedical
vocabulary discovered in the NLM Test Collection and the UMLS Metathe-
saurus.
• The lexicon entry for each word or term records the syntactic, morphologi-
cal, and graphemic information.
– Syntactic information includes syntactic category(part of speech), and complementation pat-
terns for verbs, adjectives and nouns, as well as positional and modification types for adjec-
tives and adverbs.
– Inflectional morphology is indicated for those syntactic categories which inflect, and spelling
variation is recorded for each lexical item known to exhibit such variation.
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
SPECIALIST NLP Tools
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
관련연구
[1] Wu S.T., Liu.H et al (2012). Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis. Journal of the American Medical Informatics Association : JAMIA, 19(e1), e149–e156.
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
• Objective
– To characterise empirical instances of Unified Medical Language Sys-
tem (UMLS) Metathesaurus term strings in a large clinical corpus, and
to illustrate what types of term characteristics are generalisable across
data sources.
• Data Sources
– The data source for the corpus analysis of clinical text was Mayo Clinic
clinical notes between 1 January 2001 and 31 December 2010, re-
trieved from the Mayo’s Enterprise Data Trust (EDT).
– 51,945,627EA documents
– 296,167 unique terms
– 2,319,010,575 case-insensitive exact term match
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
• Figure 1 shows histograms for the number of words in the UMLS and in the subset that is empirically found in Mayo Clinic data.
• Corpus Analysis – Word Statistics
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
• Corpus Analysis - Term Frequency
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
• Corpus Analysis – Source Terminology
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
• Corpus Analysis – syntactic categories
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
• Cross-Institutional analysis
① Special characters
② Maximum number of words
③ Maximum number of characters
④ Language
⑤ Source terminology
⑥ Semantic group
⑦ Empirical occurrence filter
⑧ Term frequency
• SNOMED-CT• Consumer Health Vocabulary• National Cancer Institute(NCI) Thesaurus• Medical Subject Headings (MSH)• Read Codes• Medical Dictionary for Regulatory Activities Terminology (Med-
DRA)• SNOMED International• MEDCIN• UMLS Metathesaurus• National Drug Filed Reference Terminology(NDF-RT)• The original SNOMED• Online Mendelian Inheritance in Man (OMIM)• Logical Observation Identifiers Names and Codes (LOINC)• Computer Retrieval of Information on Scientific Projects
(CRISP)
• Anatomy• chemicals & drugs• concepts & ideas• Disorders• living beings• physiology• procedures
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
[1] UMLS term occurrences in clinical notes
Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.
Eunsil Yoon
Reference
• UMLS; http://www.nlm.nih.gov/research/umls
• UMLS Basics Tutorial; http://
www.nlm.nih.gov/research/umls/new_users/online_learning/i
ndex.htm
• UTS; https://uts.nlm.nih.gov/
• Wu S.T., Liu.H et al (2012). Unified Medical Language Sys-
tem term occurrences in clinical notes: a large-scale corpus
analysis. Journal of the American Medical Informatics Asso-
ciation : JAMIA, 19(e1), e149–e156.
• 한승빈 , 김승희 , 최진욱 . ‘UMLS Metathesaurus 2004 의 새로운 파일구조 - Rich Release Format(RRF) 의 소개’