Upload
paul-albert
View
741
Download
6
Tags:
Embed Size (px)
Citation preview
Controlled vocabularies and VIVO
Paul [email protected] Cornell Medical College
We've seen 959 ways to refer to Proceedings of the National Academy of Sciences.
Google Scholar Development Teamhttp://bit.ly/K6xRf0
The problem
We've seen 959 ways to refer to Proceedings of the National Academy of Sciences.
Google Scholar Development Teamhttp://bit.ly/K6xRf0
The problem
¡Ay mi estómago!
The main intent of the Semantic Web is to give machines much better access to information resources so they can be information intermediaries in support of humans.
Michael Uscholdhttp://bit.ly/JuWSUg
Let’s Define Our Terms
controlled vocabulary
taxonomy
thesaurus
ontology
Explicit List
HierarchyAssociative
Relationships
Grammar
✓
✓
✓
✓
✓
✓
✓ ✓
✓ ✓
WarningPursuit of controlled vocabulary
tends to expose source systems for the quagmires they are.
Which controlled vocabulary should I use?
Selecting controlled vocabularies: when snobbery is a virtue
“Desiderata” for Controlled Medical Vocabularies
http://bit.ly/desider
Methods of Information in Medicine f © F. K. Schattauer Verlagsgesellschaft mbH (1998) I ,
J. J. Cimino
Department of Medical Informatics, Columbia University, New York, USA
1. Introduction
The need for controlled vocabularies in medical computing systems is widely recognized. Even systems which deal with narrative text and images provide enhanced capabilities through coding of their data with controlled vocabularies. Over the past four decades, system developers have dealt with this need by creating ad hoc sets of controlled terms for use in their applications. When the sets were small, their creation was a simple matter, but as applications have grown in function and complexity, the effort needed to create and maintain the controlled vocabularies became substantial. With each new system, new efforts were required, because previous vocabularies were deemed unsuitable for adoption in or adaptation to new applications. Furthermore, information in one system could not be recognized by other systems, hindering the ability to integrate component applications into larger systems.
Consider, for example, how a com-puter-based medical record system might work with a diagnostic expert system to improve patient care. In order
394
Desiderata for Controlled Medical Vocabularies in the Twenty-First Century Abstract: Builders of medical informatics applications need controlled medical vocabularies to support their applications and it is to their advan-tage to use available standards. In order to do so, however, these stand-ards need to address the requirements of their intended users. Overthe past decade, medical informatics researchers have begun to articulate some of these requirements. This paper brings together some of the common themes which have been described, including: vocabulary content, concept orientation, concept permanence, nonsemantic concept identifiers, poly-hierarchy, formal definitions, rejection of "not elsewhere classified" terms, multiple granularities, mUltiple consistent views, context representation, graceful evolution, and recognized redundancy. Standards developers are beginning to recognize and address these desiderata and adapt their offer-ings to meet them.
Keywords: Controlled Medical Terminology, Vocabulary, Standards, Review
to achieve optimal integration of the two, transfer of patient information from the record to the expert would need to be automated. In one attempt to do so, the differences between the controlled vocabularies of the two systems was found to be the major obstacle - even when both systems were created by the same developers [1].
The solution seems obvious: stand-ards [2]. In fact, many standards have been proposed, but their adoption has been slow. Why? System developers generally indicate that, while they would like to make use of standards, they can't find one that meets their needs. What are those needs? The answers to this question are less clear. The simple answer is, "It doesn't have what I want to say." Standards devel-opers have taken this to mean that the solution is equally simple: keep adding terms to the vocabulary until it does say what's needed. However, systems de-velopers, as users of controlled vocabu-laries, are like users everywhere: they may not always articulate their true needs. Vocabulary developers have labored to increase their offerings, but have continued to be confronted with
ambivalence. A number of vocabularies have been put forth as standards [3] but they have been found wanting in some recent evaluations [4-6].
Over the past ten years or so, medi-cal informatics researchers have been studying controlled vocabulary issues directly. They have examined the struc-ture and content of existing vocabu-laries to determine why they seem unsuitable for particular needs, and they have proposed solutions. In some cases, proposed solutions have been carried forward into practice and new experience has been gained. As we prepare to enter the twenty-first cen-tury, it seems appropriate to pause to reflect on this additional experience, to rethink the directions we should pursue, and to identify the next set of goals for the development of standard, reusable, mUltipurpose controlled medical voca-bularies.
2. Desiderata
The task of enumeration of general desiderata for controlled vocabularies IS hampered in two ways. First, the
Meth Inform Med 1998; 37: 394-403
For personal or educational use only. No other uses without permission. All rights reserved.Downloaded from www.methods-online.com on 2011-09-09 | ID: 1000394272 | IP: 156.40.192.22
“Desiderata” for Controlled Medical Vocabularies
http://bit.ly/desider
1. Content – formal editorial policy and methodology; provide breadth and depth; don’t just add terms
2. Concept orientation – exactly one meaning per concept and exactly one concept per meaning
3. Concept permanence – old concepts can't be deleted; names can be changed as long as meaning doesn't change
“Desiderata” for Controlled Medical Vocabularies
http://bit.ly/desider
4. Nonsemantic identifiers – use a meaningless integer
5. Polyhiearchy – employ multiple hierarchies to support need for tree walking and inferencing
6. Formal Definitions – structured descriptions that invoke relationships within the terminology
“Desiderata” for Controlled Medical Vocabularies
http://bit.ly/desider
7. Reject “not elsewhere classified” – terminology changes induce semantic drift
8. Graceful evolution – fix mistakes; account for changes in medical knowledge
9. Recognize redundancy – redundant expressions are inevitable, but redundant concepts are bad
Is Roz Chast’s ice cream ontology desiderata compliant?
Is Roz Chast’s ice cream ontology desiderata compliant?
CompliantReject "Not Elsewhere Classified"Recognize Redundancy
Unclear or Non-CompliantContentConcept PermanenceGraceful EvolutionConcept OrientationNonsemantic Concept IdentifiersPolyhierarchyFormal Definitions
What is the license of the controlled vocabulary?
• Are the ontology codes copyrighted and can they be used in an open source application?
• Need to account for the possibility that the data is reused for a commercial interest
Who will maintain and host the vocabulary?
Externally maintained vocabularies are more sustainable
Controlled vocabularies used in VIVO
“The VIVO community might be able to build services to serve controlled vocabularies for
organizations and journals.”http://bit.ly/J7Vd8w
The Ontology Team is considering serving vocabularies
for select domains
Food and Agriculture Organization (FAO) geopolitical ontology
• master reference for geopolitical information in multiple languages
• provides relations among territories (land borders, group membership, etc)
• tracks historical changes
Ships with VIVO application
Academic Degrees
Ships with VIVO application
As of version 1.4, VIVO allows users to lookup terms from
UMLS and GEMET
As of version 1.4, VIVO allows users to lookup terms from
UMLS and GEMET
As of version 1.4, VIVO allows users to lookup terms from
UMLS and GEMET
As of version 1.4, VIVO allows users to lookup terms from
UMLS and GEMET
administrationagricultureairanimal husbandrybiologybuildingchemistryclimatedisasters, accidents, riskeconomicsenergyenvironmental policyfisheryfood, drinking water
forestrygeneralgeographyhuman healthindustryinformationlegislationmaterialsmilitary aspectsnatural areas, landscape, ecosystemsnatural dynamicsnoise, vibrationsphysicspollution
radiationsresearchresourcessocial aspects, populationsoilspacetourismtrade, servicestransporturban environment, urban stresswastewater
GEMET: controlled vocabulary for environmental topics
Vocabularies actively being considered for VIVO
• colleges and universities
• journals
- open source status (VIVOONT-433)
• languages (VIVOONT-250)
- model write, speak, proficiency
• others?
Viaf.org – one promising option for organizations
Modeling medical terms in VIVO
Types of Specialty
All Specialties
Board-Certified Specialties
Board-Certified Subspecialties
Types of Medical Expertise
Board-certifiedin Cardiology
Performed100+ ECGs
Invented a better ECG
GLG-20s masqueradingas doctors for comic e!ect
ResearchClinicalFeigned
< <
We use Intelligent Medical Objects (IMO)’s interface
terminology• Maps medical expertise terms to SNOMED CT
• Useful for returning relevant results to patients searching for a doctor
• Enables the physician to enter more arcane areas of expertise (e.g., Asian American Community Health)
• A commercial application
Physician Admin View:Search for “chemotherapy” in IMO
Physician Admin View:Search for “that” yields manyterms not in SNOMED CT.
Expertise exists in POPS. Board certification dataexists in POPS, Intellicred.
Export from Physicians Profile System contains specialty and expertise
Board Certifications
Problem #1: No indication of certifying board.
At least 13 certifications including geriatric medicine,
pain medicine, and urology are given by at least one ABMS
board.
Board Certifications
Problem #2: Names of certifications are ambiguous.
Colon and rectal surgery is listed in the following alternate ways:
Surgery, Colon and Rectal Colon-Rectal Surgery
Colorectal Surgery
Board Certifications
Problem #3: No given date of certification.
Board Certifications
Problem #4: Which source vocabulary?
Prior to 19701970-19791970-1979
The National Uniform Claim Committee (NUCC) maintains a list of health care provider taxonomy codes, but this list seems to be exclusively for non-MDs.
Change in number of ABMS Subspecialties/Specialties
Pre-1970
1970-1979
1980 - 1992
By 1996By 1999
2012
10 20
66 74 84
145
Cosmetic DentistryCosmetic DermatologyCosmetic SurgeryCritical Care NeurologyDermatology, GeneralEar, Nose, and Throat, PediatricEchocardiographyElectrodiagnostic MedicineEmergency NeurologyEndocrinologyFacial Plastic and Reconstructive SurgeryFacial Plastic SurgeryFamily PsychologyFetal CardiologyFoot and Ankle SurgeryFoot SurgeryGastroenterology PathologyGastrointestinal PathologyGastrointestinal SurgeryGeneral AnesthesiologyGeneral CardiologyGeneral DentistryGeneral DermatologyGeneral Internal MedicineGeneral NeurologyGeneral NeurosurgeryGeneral Obstetrics and GynecologyGeneral OphthalmologyGeneral PediatricsGeneral PsychiatryGeneral SurgeryGeneral UrologyGenetics, MedicalGeriatric CardiologyGeriatric DermatologyGeriatric Emergency Medicine
Geriatric PsychotherapyGynecologic EndocrinologyGynecologic PathologyGynecologyHand SurgeryHeart SurgeryHematology/OncologyHepatobiliary SurgeryHepatologyHigh Risk ObstetricsHospitalistImmunopathologyInfant PsychiatryIntensive CareInternal Medicine, GeneralInternational MedicineInternational Travel MedicineInterventional NeuroradiologyInterventional OncologyInterventional Pain ManagementInterventional RadiologyInvasive CardiologyLaboratory MedicineLaryngologyLiver PathologyMaternal-Fetal MedicineMedical GeneticsMolecular GeneticsMolecular HematopathologyMolecular Infectious DiseaseMolecular PathologyMusculoskeletal OncologyMusculoskeletal RadiologyNeonatal NeurologyNeonatal SurgeryNeonatal Thoracic SurgeryNeonatologyNeonatology, Pediatric
Neuro Critical CareNeuro RadiologyNeuro-OphthalmologyNeuro-PathologyNutritionOral and Maxillofacial PathologyOral and Maxillofacial SurgeryOrthodonticsOrthopedic SurgeryOrthopedicsPain Medicine/Pain ManagementPathologyPediatric Allergy and ImmunologyPediatric Behavior and DevelopmentPediatric DentistryPediatric Neurological SurgeryPediatric NeurologyPediatric NeurosurgeryPediatric Orthopedic SurgeryPediatric OrthopedicsPeriodonticsPlastic and Reconstructive SurgeryPsychologyPulmonary Disease MedicineRadiologyRadiology, Vascular/InterventionalReproductive EndocrinologySurgery, Critical CareSurgery, HandSurgery, Oral and MaxillofacialThoracic SurgeryVascular and Interventional Radiology
Prior to 19701970-19791970-1979
The following 135 board certifications in our system
are not recognized by ABMS.
Weill Game Plan for Board Certifications
• Explore ingest from Intellicred (fewer certifications, less variability, may include certifying agency?)
• Explore external vocabularies
• Failing that, create our own
Medical Expertise and Non-Certified Specialties
53% of terms from the source system
correspond exactly to some representation
in UMLS
Expertise term from Weill Cornell Physician Profile
System (n = 2578)
Identical
How does a term of local clinical expertise map to UMLS using Stony Brook's API?
5% of terms from the source system
have some equivalent in UMLS that is
lexically different but semantically identical
34% of terms from the source system
can only be represented as a
combination of terms from UMLS
– Polycystic Ovary Syndrome– Anaphylaxis– Aortic Dissection– Chemoembolization– Dental Implant– Echocardiogram
Equivalent preserving original meaning
Compound term
Weill → UMLS– Biopsy of Skin → Skin biopsy – Aneurysm of Popliteal Artery → Aneurysm Popliteal– Charcot-Marie-Tooth Disease → Charcot-Marie-Tooth– Cirrhosis of Liver → Cirrhosis– Coarctation of the Aorta → Coarctation
Weill → UMLS– Asian American Community Health → Asian American | Community Health – Endoscopic Ultrasound of Esophagus → Endoscopic Ultrasound | Esophagus– Chronic Pelvic Pain In Female → Chronic Pelvic Pain | Female– Bronchoscopy With Biopsy → Bronchoscopy | Biopsy– Green Light Laser Procedure → Green Light | Laser Procedure
3% of terms from the source system can be represented by the joining (not
intersection) of two concepts in UMLS
Union of two concepts
Weill → UMLS– Billing and Coding → Billing | Coding– Bone and Mineral Metabolism → Bone Metabolism | Mineral Metabolim– Bladder and Prostate Cancer → Bladder Cancer | Prostate Cancer
2% of terms from the source system
can only be represented as a
subtype of a concept in UMLS
Subtype
Weill → UMLS– Bipolar 1 Disorder → Bipolar Disorder– FAA Medical Exam → Medical Exam
3% of terms from the source system
lack or have an unclear equivalent
in UMLS
Unclear
Weill → UMLS– In Vitro Fertilization Counseling → Vitro | Fertilization | Counseling– Adjustable Band → Band– Bowel-Sparing Strictureplasty → Nothing
Pre-coordinationTerms combined at the time of search and retrieval using Boolean or other operators.
Lazy or “busy” developers
avian AND hypersensitivity AND pneumonitis
carrier sense AND multiple access
Definition
Benefits
Examples
Post-coordinationTerms combined by a developer to denote a specific concept and its attributes more precisely.
Users who are not totally familiar with a controlled vocabulary and its structure.
avian hypersensitivity pneumonitis
carrier sense multiple access
How do we semantically model post-coordinated terms?
1. Do not mess with post-coordination. User adds term from lookup service. That's it. (Existing method.)
2. User adds term from lookup service. Machine makes basic inferences based on similarity. (Everything is "related term.")
3. User adds term from lookup service. Administrator models terms.
4. User adds term from lookup service. User interface enables and guides end user.
Option #3: User adds term from lookup service. Administrator
models terms.
Can we build on others' work?
• The International Health Terminology Standards Development Organization (IHTSDO) in Denmark is working to develop and promote SNOMED to support sharing of modelling.
• IMO, our terminology service, may help model coordinated terms.
Need for post-coordination is widespreadFor example, many global health terms
require coordination.
UMLS vs. SNOMED CT
UMLS’s rapid growth is somewhat at odds with desiderata compliance
0
3000000
6000000
9000000
12000000
19992000
20012002
20032004
20052006
20072008
200920102011
Strings
Concepts
Cimino’s Critique of Terminologies Desiderata Adherence
ICD
CPT
DRG
NDC
RxNorm
LOINC
Nursing
SNOMED
MeSH
UMLS
Cov Conc Perm ID Hier Def NEC Evol Redun
+ - - - +/- - - - -
- + + + - - - + -
- + + + - - - + -
+ + - - - - + - -
+ + + + + + + + +
+ + + + +/- + + + +
+ + +/- + +/- - - +/- +/-
+ + + + + +/- + + +
+ +/- + + +/- - - + -
+ + + + +/- - n/a + -
Cov: Content coverage Conc: Concept oriented Perm: Concept permanenceID: meaningless identifiers Hier: Multiple hierarchy Def: Formal definitionsNEC: Rejected “Not Elsewhere Classified” Evol: Graceful evolution Redun: Detect redundancy
Why SNOMED CT may be better at representing medical terms
compared to UMLS
• No formal conceptual model (near-synonymy)• No hierarchy• Lots of redundancy• Lots of ambiguity
UMLS is good for helping you find terms in a specific terminology because all many-to-one term-to-concept mappings expand the synonyms you can match against. I recommend you use UMLS to find terms from a very limited set of terminologies - maybe SNOMED plus LOINC plus RxNorm, for example.
Jim Cimino
Classesskos:Concept!!!!snomedct:Procedure!!!!snomedct:Disorder!!!!rxnorm:Drug!!!!...
Propertiesskos:related!!!!snomedct:equivalentTo!!!!...skos:broaderskos:narrower
Proposed Role of SKOS
Read More
Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularieshttp://bit.ly/niso-standard
Desiderata for Controlled Medical Vocabularieshttp://bit.ly/desider
Practice Robot Courtesy with Local Extensions
Use classes/properties that are subclasses/subproperties of existing
classes/properties in VIVO’s core ontology.