57
Controlled vocabularies and VIVO Paul Albert [email protected] Weill Cornell Medical College

Controlled vocabularies and VIVO

Embed Size (px)

Citation preview

Page 1: Controlled vocabularies and VIVO

Controlled vocabularies and VIVO

Paul [email protected] Cornell Medical College

Page 2: Controlled vocabularies and VIVO

We've seen 959 ways to refer to Proceedings of the National Academy of Sciences.

Google Scholar Development Teamhttp://bit.ly/K6xRf0

The problem

Page 3: Controlled vocabularies and VIVO

We've seen 959 ways to refer to Proceedings of the National Academy of Sciences.

Google Scholar Development Teamhttp://bit.ly/K6xRf0

The problem

¡Ay mi estómago!

Page 4: Controlled vocabularies and VIVO

The main intent of the Semantic Web is to give machines much better access to information resources so they can be information intermediaries in support of humans.

Michael Uscholdhttp://bit.ly/JuWSUg

Page 5: Controlled vocabularies and VIVO

Let’s Define Our Terms

controlled vocabulary

taxonomy

thesaurus

ontology

Explicit List

HierarchyAssociative

Relationships

Grammar

✓ ✓

✓ ✓

Page 6: Controlled vocabularies and VIVO

WarningPursuit of controlled vocabulary

tends to expose source systems for the quagmires they are.

Page 7: Controlled vocabularies and VIVO

Which controlled vocabulary should I use?

Page 8: Controlled vocabularies and VIVO

Selecting controlled vocabularies: when snobbery is a virtue

Page 9: Controlled vocabularies and VIVO

“Desiderata” for Controlled Medical Vocabularies

http://bit.ly/desider

Methods of Information in Medicine f © F. K. Schattauer Verlagsgesellschaft mbH (1998) I ,

J. J. Cimino

Department of Medical Informatics, Columbia University, New York, USA

1. Introduction

The need for controlled vocabularies in medical computing systems is widely recognized. Even systems which deal with narrative text and images provide enhanced capabilities through coding of their data with controlled vocabularies. Over the past four decades, system developers have dealt with this need by creating ad hoc sets of controlled terms for use in their applications. When the sets were small, their creation was a simple matter, but as applications have grown in function and complexity, the effort needed to create and maintain the controlled vocabularies became substantial. With each new system, new efforts were required, because previous vocabularies were deemed unsuitable for adoption in or adaptation to new applications. Furthermore, information in one system could not be recognized by other systems, hindering the ability to integrate component applications into larger systems.

Consider, for example, how a com-puter-based medical record system might work with a diagnostic expert system to improve patient care. In order

394

Desiderata for Controlled Medical Vocabularies in the Twenty-First Century Abstract: Builders of medical informatics applications need controlled medical vocabularies to support their applications and it is to their advan-tage to use available standards. In order to do so, however, these stand-ards need to address the requirements of their intended users. Overthe past decade, medical informatics researchers have begun to articulate some of these requirements. This paper brings together some of the common themes which have been described, including: vocabulary content, concept orientation, concept permanence, nonsemantic concept identifiers, poly-hierarchy, formal definitions, rejection of "not elsewhere classified" terms, multiple granularities, mUltiple consistent views, context representation, graceful evolution, and recognized redundancy. Standards developers are beginning to recognize and address these desiderata and adapt their offer-ings to meet them.

Keywords: Controlled Medical Terminology, Vocabulary, Standards, Review

to achieve optimal integration of the two, transfer of patient information from the record to the expert would need to be automated. In one attempt to do so, the differences between the controlled vocabularies of the two systems was found to be the major obstacle - even when both systems were created by the same developers [1].

The solution seems obvious: stand-ards [2]. In fact, many standards have been proposed, but their adoption has been slow. Why? System developers generally indicate that, while they would like to make use of standards, they can't find one that meets their needs. What are those needs? The answers to this question are less clear. The simple answer is, "It doesn't have what I want to say." Standards devel-opers have taken this to mean that the solution is equally simple: keep adding terms to the vocabulary until it does say what's needed. However, systems de-velopers, as users of controlled vocabu-laries, are like users everywhere: they may not always articulate their true needs. Vocabulary developers have labored to increase their offerings, but have continued to be confronted with

ambivalence. A number of vocabularies have been put forth as standards [3] but they have been found wanting in some recent evaluations [4-6].

Over the past ten years or so, medi-cal informatics researchers have been studying controlled vocabulary issues directly. They have examined the struc-ture and content of existing vocabu-laries to determine why they seem unsuitable for particular needs, and they have proposed solutions. In some cases, proposed solutions have been carried forward into practice and new experience has been gained. As we prepare to enter the twenty-first cen-tury, it seems appropriate to pause to reflect on this additional experience, to rethink the directions we should pursue, and to identify the next set of goals for the development of standard, reusable, mUltipurpose controlled medical voca-bularies.

2. Desiderata

The task of enumeration of general desiderata for controlled vocabularies IS hampered in two ways. First, the

Meth Inform Med 1998; 37: 394-403

For personal or educational use only. No other uses without permission. All rights reserved.Downloaded from www.methods-online.com on 2011-09-09 | ID: 1000394272 | IP: 156.40.192.22

Page 10: Controlled vocabularies and VIVO

“Desiderata” for Controlled Medical Vocabularies

http://bit.ly/desider

1. Content – formal editorial policy and methodology; provide breadth and depth; don’t just add terms

2. Concept orientation – exactly one meaning per concept and exactly one concept per meaning

3. Concept permanence – old concepts can't be deleted; names can be changed as long as meaning doesn't change

Page 11: Controlled vocabularies and VIVO

“Desiderata” for Controlled Medical Vocabularies

http://bit.ly/desider

4. Nonsemantic identifiers – use a meaningless integer

5. Polyhiearchy – employ multiple hierarchies to support need for tree walking and inferencing

6. Formal Definitions – structured descriptions that invoke relationships within the terminology

Page 12: Controlled vocabularies and VIVO

“Desiderata” for Controlled Medical Vocabularies

http://bit.ly/desider

7. Reject “not elsewhere classified” – terminology changes induce semantic drift

8. Graceful evolution – fix mistakes; account for changes in medical knowledge

9. Recognize redundancy – redundant expressions are inevitable, but redundant concepts are bad

Page 13: Controlled vocabularies and VIVO

Is Roz Chast’s ice cream ontology desiderata compliant?

Page 14: Controlled vocabularies and VIVO

Is Roz Chast’s ice cream ontology desiderata compliant?

CompliantReject "Not Elsewhere Classified"Recognize Redundancy

Unclear or Non-CompliantContentConcept PermanenceGraceful EvolutionConcept OrientationNonsemantic Concept IdentifiersPolyhierarchyFormal Definitions

Page 15: Controlled vocabularies and VIVO

What is the license of the controlled vocabulary?

• Are the ontology codes copyrighted and can they be used in an open source application?

• Need to account for the possibility that the data is reused for a commercial interest

Page 16: Controlled vocabularies and VIVO

Who will maintain and host the vocabulary?

Externally maintained vocabularies are more sustainable

Page 17: Controlled vocabularies and VIVO

Controlled vocabularies used in VIVO

Page 18: Controlled vocabularies and VIVO

“The VIVO community might be able to build services to serve controlled vocabularies for

organizations and journals.”http://bit.ly/J7Vd8w

The Ontology Team is considering serving vocabularies

for select domains

Page 19: Controlled vocabularies and VIVO

Food and Agriculture Organization (FAO) geopolitical ontology

• master reference for geopolitical information in multiple languages

• provides relations among territories (land borders, group membership, etc)

• tracks historical changes

Ships with VIVO application

Page 20: Controlled vocabularies and VIVO

Academic Degrees

Ships with VIVO application

Page 21: Controlled vocabularies and VIVO

As of version 1.4, VIVO allows users to lookup terms from

UMLS and GEMET

Page 22: Controlled vocabularies and VIVO

As of version 1.4, VIVO allows users to lookup terms from

UMLS and GEMET

Page 23: Controlled vocabularies and VIVO

As of version 1.4, VIVO allows users to lookup terms from

UMLS and GEMET

Page 24: Controlled vocabularies and VIVO

As of version 1.4, VIVO allows users to lookup terms from

UMLS and GEMET

Page 25: Controlled vocabularies and VIVO

administrationagricultureairanimal husbandrybiologybuildingchemistryclimatedisasters, accidents, riskeconomicsenergyenvironmental policyfisheryfood, drinking water

forestrygeneralgeographyhuman healthindustryinformationlegislationmaterialsmilitary aspectsnatural areas, landscape, ecosystemsnatural dynamicsnoise, vibrationsphysicspollution

radiationsresearchresourcessocial aspects, populationsoilspacetourismtrade, servicestransporturban environment, urban stresswastewater

GEMET: controlled vocabulary for environmental topics

Page 26: Controlled vocabularies and VIVO

Vocabularies actively being considered for VIVO

• colleges and universities

• journals

- open source status (VIVOONT-433)

• languages (VIVOONT-250)

- model write, speak, proficiency

• others?

Page 27: Controlled vocabularies and VIVO

Viaf.org – one promising option for organizations

Page 28: Controlled vocabularies and VIVO

Modeling medical terms in VIVO

Page 29: Controlled vocabularies and VIVO

Types of Specialty

All Specialties

Board-Certified Specialties

Board-Certified Subspecialties

Page 30: Controlled vocabularies and VIVO

Types of Medical Expertise

Board-certifiedin Cardiology

Performed100+ ECGs

Invented a better ECG

GLG-20s masqueradingas doctors for comic e!ect

ResearchClinicalFeigned

< <

Page 31: Controlled vocabularies and VIVO

We use Intelligent Medical Objects (IMO)’s interface

terminology• Maps medical expertise terms to SNOMED CT

• Useful for returning relevant results to patients searching for a doctor

• Enables the physician to enter more arcane areas of expertise (e.g., Asian American Community Health)

• A commercial application

Page 32: Controlled vocabularies and VIVO

Physician Admin View:Search for “chemotherapy” in IMO

Page 33: Controlled vocabularies and VIVO

Physician Admin View:Search for “that” yields manyterms not in SNOMED CT.

Page 34: Controlled vocabularies and VIVO

Expertise exists in POPS. Board certification dataexists in POPS, Intellicred.

Page 35: Controlled vocabularies and VIVO

Export from Physicians Profile System contains specialty and expertise

Page 36: Controlled vocabularies and VIVO

Board Certifications

Problem #1: No indication of certifying board.

At least 13 certifications including geriatric medicine,

pain medicine, and urology are given by at least one ABMS

board.

Page 37: Controlled vocabularies and VIVO

Board Certifications

Problem #2: Names of certifications are ambiguous.

Colon and rectal surgery is listed in the following alternate ways:

Surgery, Colon and Rectal Colon-Rectal Surgery

Colorectal Surgery

Page 38: Controlled vocabularies and VIVO

Board Certifications

Problem #3: No given date of certification.

Page 39: Controlled vocabularies and VIVO

Board Certifications

Problem #4: Which source vocabulary?

Prior to 19701970-19791970-1979

Page 40: Controlled vocabularies and VIVO

The National Uniform Claim Committee (NUCC) maintains a list of health care provider taxonomy codes, but this list seems to be exclusively for non-MDs.

Page 41: Controlled vocabularies and VIVO

Change in number of ABMS Subspecialties/Specialties

Pre-1970

1970-1979

1980 - 1992

By 1996By 1999

2012

10 20

66 74 84

145

Page 42: Controlled vocabularies and VIVO

Cosmetic DentistryCosmetic DermatologyCosmetic SurgeryCritical Care NeurologyDermatology, GeneralEar, Nose, and Throat, PediatricEchocardiographyElectrodiagnostic MedicineEmergency NeurologyEndocrinologyFacial Plastic and Reconstructive SurgeryFacial Plastic SurgeryFamily PsychologyFetal CardiologyFoot and Ankle SurgeryFoot SurgeryGastroenterology PathologyGastrointestinal PathologyGastrointestinal SurgeryGeneral AnesthesiologyGeneral CardiologyGeneral DentistryGeneral DermatologyGeneral Internal MedicineGeneral NeurologyGeneral NeurosurgeryGeneral Obstetrics and GynecologyGeneral OphthalmologyGeneral PediatricsGeneral PsychiatryGeneral SurgeryGeneral UrologyGenetics, MedicalGeriatric CardiologyGeriatric DermatologyGeriatric Emergency Medicine

Geriatric PsychotherapyGynecologic EndocrinologyGynecologic PathologyGynecologyHand SurgeryHeart SurgeryHematology/OncologyHepatobiliary SurgeryHepatologyHigh Risk ObstetricsHospitalistImmunopathologyInfant PsychiatryIntensive CareInternal Medicine, GeneralInternational MedicineInternational Travel MedicineInterventional NeuroradiologyInterventional OncologyInterventional Pain ManagementInterventional RadiologyInvasive CardiologyLaboratory MedicineLaryngologyLiver PathologyMaternal-Fetal MedicineMedical GeneticsMolecular GeneticsMolecular HematopathologyMolecular Infectious DiseaseMolecular PathologyMusculoskeletal OncologyMusculoskeletal RadiologyNeonatal NeurologyNeonatal SurgeryNeonatal Thoracic SurgeryNeonatologyNeonatology, Pediatric

Neuro Critical CareNeuro RadiologyNeuro-OphthalmologyNeuro-PathologyNutritionOral and Maxillofacial PathologyOral and Maxillofacial SurgeryOrthodonticsOrthopedic SurgeryOrthopedicsPain Medicine/Pain ManagementPathologyPediatric Allergy and ImmunologyPediatric Behavior and DevelopmentPediatric DentistryPediatric Neurological SurgeryPediatric NeurologyPediatric NeurosurgeryPediatric Orthopedic SurgeryPediatric OrthopedicsPeriodonticsPlastic and Reconstructive SurgeryPsychologyPulmonary Disease MedicineRadiologyRadiology, Vascular/InterventionalReproductive EndocrinologySurgery, Critical CareSurgery, HandSurgery, Oral and MaxillofacialThoracic SurgeryVascular and Interventional Radiology

Prior to 19701970-19791970-1979

The following 135 board certifications in our system

are not recognized by ABMS.

Page 43: Controlled vocabularies and VIVO

Weill Game Plan for Board Certifications

• Explore ingest from Intellicred (fewer certifications, less variability, may include certifying agency?)

• Explore external vocabularies

• Failing that, create our own

Page 44: Controlled vocabularies and VIVO

Medical Expertise and Non-Certified Specialties

Page 45: Controlled vocabularies and VIVO

53% of terms from the source system

correspond exactly to some representation

in UMLS

Expertise term from Weill Cornell Physician Profile

System (n = 2578)

Identical

How does a term of local clinical expertise map to UMLS using Stony Brook's API?

5% of terms from the source system

have some equivalent in UMLS that is

lexically different but semantically identical

34% of terms from the source system

can only be represented as a

combination of terms from UMLS

– Polycystic Ovary Syndrome– Anaphylaxis– Aortic Dissection– Chemoembolization– Dental Implant– Echocardiogram

Equivalent preserving original meaning

Compound term

Weill → UMLS– Biopsy of Skin → Skin biopsy – Aneurysm of Popliteal Artery → Aneurysm Popliteal– Charcot-Marie-Tooth Disease → Charcot-Marie-Tooth– Cirrhosis of Liver → Cirrhosis– Coarctation of the Aorta → Coarctation

Weill → UMLS– Asian American Community Health → Asian American | Community Health – Endoscopic Ultrasound of Esophagus → Endoscopic Ultrasound | Esophagus– Chronic Pelvic Pain In Female → Chronic Pelvic Pain | Female– Bronchoscopy With Biopsy → Bronchoscopy | Biopsy– Green Light Laser Procedure → Green Light | Laser Procedure

3% of terms from the source system can be represented by the joining (not

intersection) of two concepts in UMLS

Union of two concepts

Weill → UMLS– Billing and Coding → Billing | Coding– Bone and Mineral Metabolism → Bone Metabolism | Mineral Metabolim– Bladder and Prostate Cancer → Bladder Cancer | Prostate Cancer

2% of terms from the source system

can only be represented as a

subtype of a concept in UMLS

Subtype

Weill → UMLS– Bipolar 1 Disorder → Bipolar Disorder– FAA Medical Exam → Medical Exam

3% of terms from the source system

lack or have an unclear equivalent

in UMLS

Unclear

Weill → UMLS– In Vitro Fertilization Counseling → Vitro | Fertilization | Counseling– Adjustable Band → Band– Bowel-Sparing Strictureplasty → Nothing

Page 46: Controlled vocabularies and VIVO

Pre-coordinationTerms combined at the time of search and retrieval using Boolean or other operators.

Lazy or “busy” developers

avian AND hypersensitivity AND pneumonitis

carrier sense AND multiple access

Definition

Benefits

Examples

Post-coordinationTerms combined by a developer to denote a specific concept and its attributes more precisely.

Users who are not totally familiar with a controlled vocabulary and its structure.

avian hypersensitivity pneumonitis

carrier sense multiple access

Page 47: Controlled vocabularies and VIVO

How do we semantically model post-coordinated terms?

1. Do not mess with post-coordination. User adds term from lookup service. That's it. (Existing method.)

2. User adds term from lookup service. Machine makes basic inferences based on similarity. (Everything is "related term.")

3. User adds term from lookup service. Administrator models terms.

4. User adds term from lookup service. User interface enables and guides end user.

Page 48: Controlled vocabularies and VIVO

Option #3: User adds term from lookup service. Administrator

models terms.

Can we build on others' work?

• The International Health Terminology Standards Development Organization (IHTSDO) in Denmark is working to develop and promote SNOMED to support sharing of modelling.

• IMO, our terminology service, may help model coordinated terms.

Page 49: Controlled vocabularies and VIVO

Need for post-coordination is widespreadFor example, many global health terms

require coordination.

Page 50: Controlled vocabularies and VIVO

UMLS vs. SNOMED CT

Page 51: Controlled vocabularies and VIVO

UMLS’s rapid growth is somewhat at odds with desiderata compliance

0

3000000

6000000

9000000

12000000

19992000

20012002

20032004

20052006

20072008

200920102011

Strings

Concepts

Page 52: Controlled vocabularies and VIVO

Cimino’s Critique of Terminologies Desiderata Adherence

ICD

CPT

DRG

NDC

RxNorm

LOINC

Nursing

SNOMED

MeSH

UMLS

Cov Conc Perm ID Hier Def NEC Evol Redun

+ - - - +/- - - - -

- + + + - - - + -

- + + + - - - + -

+ + - - - - + - -

+ + + + + + + + +

+ + + + +/- + + + +

+ + +/- + +/- - - +/- +/-

+ + + + + +/- + + +

+ +/- + + +/- - - + -

+ + + + +/- - n/a + -

Cov: Content coverage Conc: Concept oriented Perm: Concept permanenceID: meaningless identifiers Hier: Multiple hierarchy Def: Formal definitionsNEC: Rejected “Not Elsewhere Classified” Evol: Graceful evolution Redun: Detect redundancy

Page 53: Controlled vocabularies and VIVO

Why SNOMED CT may be better at representing medical terms

compared to UMLS

• No formal conceptual model (near-synonymy)• No hierarchy• Lots of redundancy• Lots of ambiguity

Page 54: Controlled vocabularies and VIVO

UMLS is good for helping you find terms in a specific terminology because all many-to-one term-to-concept mappings expand the synonyms you can match against. I recommend you use UMLS to find terms from a very limited set of terminologies - maybe SNOMED plus LOINC plus RxNorm, for example.

Jim Cimino

Page 55: Controlled vocabularies and VIVO

Classesskos:Concept!!!!snomedct:Procedure!!!!snomedct:Disorder!!!!rxnorm:Drug!!!!...

Propertiesskos:related!!!!snomedct:equivalentTo!!!!...skos:broaderskos:narrower

Proposed Role of SKOS

Page 56: Controlled vocabularies and VIVO

Read More

Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularieshttp://bit.ly/niso-standard

Desiderata for Controlled Medical Vocabularieshttp://bit.ly/desider

Page 57: Controlled vocabularies and VIVO

Practice Robot Courtesy with Local Extensions

Use classes/properties that are subclasses/subproperties of existing

classes/properties in VIVO’s core ontology.