View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Cornerstone I: Representing Knowledge
From Data to Knowledge Through Concept-Oriented Terminologies
James J. Cimino
The first step on the path to knowledge is getting things by their right names.
-Chinese saying
Overview
• What is “data to knowledge”?
• Knowledge representation choices
• Knowledge-based terminology efforts
• Medical Entities Dictionary
• Proof of concepts
What is “data to knowledge”?
• Start with patient data in the medical record
• Enhance knowledge by:
– gaining a better understanding of the patient
– learning relevant knowledge
– bringing smart systems to bear to apply knowledge
– discovering new knowledge from health data
Knowledge Representation
• Terminology for representing symbols
• Format for arranging the symbols
Knowledge Representation Choices
• Guideline implementation
Guideline Implementation
• Starren and Xie, SCAMC, 1994
• National Cholesterol Education Panel Guideline
National Cholesterol Education Panel Guideline
Measure Cholesterol& Assess Risk Factors
Cholesterol 200 to 239Cholesterol <200 Cholesterol >239
HDL >35, <2 Risks HDL <35 or 2 Risks
Provide dietary informationReevaluate in 2 years
Cholesterol 200 to 239
HDL >35, <2 Risks
Guideline Implementation
• Starren and Xie, SCAMC, 1994
• National Cholesterol Education Panel Guideline• Three representations:
– PROLOG (first-order logic)
NCEP Guideline in PROLOG
rule_j(PID):-
check_lab(PID,hdl,HDL,_),!,
HDL >= 35,
total_risk(PID,Risk),!,
Risk < 2,
check_lab(PID,cholesterol), C,_),
C >= 200,
C =< 239,
print_rule_j.
Guideline Implementation
• Starren and Xie, SCAMC, 1994
• National Cholesterol Education Panel Guideline
• Three representations:– PROLOG (first-order logic)– CLASSIC (frames)
NCEP Guideline in CLASSIC
(CL-DEFINE-CONCEPT ‘C-PATIENT
‘(AND
(ALL CHOL
(AND INTEGER
(MIN 200) (MAX 239)))))
(CL-DEFINE-CONCEPT ‘G-PATIENT
‘(AND C-PATIENT LOW-RISK-PATIENT
(ALL HDL (AND INTEGER (MIN 35)))))
Guideline Implementation
• Starren and Xie, SCAMC, 1994
• National Cholesterol Education Panel Guideline
• Three representations:– PROLOG (first-order logic)– CLASSIC (frames)– CLIPS (production rules)
NCEP Guideline in CLIPS
(defrule C2G2J “Rules to reach box J”
?f1 <- (calculated-patient (state c)
(done no) (hdl ?hdl) (name ?name)
(test (>= ?hdl 35))
=>
(printout “Patient “ ?name “needs treatment”)
Guideline Implementation
• Starren and Xie, SCAMC, 1994
• National Cholesterol Education Panel Guideline
• Three representations:– PROLOG (first-order logic)– CLASSIC (frames)– CLIPS (production rules)
• “All three representations proved adequate for encoding the guideline”
Knowledge Representation Choices
• Guideline implementation
• Terminologic knowledge
Terminology Representation Choices
• Frame-based
Frame-Based Representation
Serum Glucose Test
is-a: Lab Test
Measures: Glucose
Specimen: Serum
Units: “mg/dl”
Terminology Representation Choices
• Frame-based
Terminology Representation Choices
• Semantic network
Semantic Network Representation
SerumGlucose
Test
Chemical
is-a
Lab Test
is-a
Body Substance
is-a
SerumGlucose
Terminology Representation Choices
• Frame-based
• Semantic network
Terminology Representation Choices
• Conceptual graphs
Conceptual Graph Representation
[Serum Glucose Test] -
(is-a) -> [Lab Test]
(measures) -> [Glucose]
(specimen) -> [Serum]
Terminology Representation Choices
• Frame-based
• Semantic network
• Conceptual graphs
Terminology Representation Choices
Knowledge Representation Choices
• Guideline implementation
• Terminologic knowledge
Knowledge Representation
• Terminology for representing symbols
• Format for arranging the symbols
• Terminology and format for representing terminologic knowledge
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991
Jochen Bernauer, SCAMC, 1991
• Conceptual graphs to model findings
increased_uptake site femur site_attr right
during
bone_phase
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991• Rector, Nolan and Glowinski, SCAMC, 1993
Rector, Nolan and Glowinski, SCAMC, 1993
•GALEN project
conditions grammatically haveLocation bodyparts
fractures sensibly haveLocation bones
femurs sensiblyAndNecessarily haveDivision neck
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991• Rector, Nolan and Glowinski, SCAMC, 1993• Campbell and Musen, SCAMC, 1993
Campbell and Musen, SCAMC, 1993
• Conceptual graphs and SNOMED
• Pain + Chest + Radiation to + Left + Arm
(located in) -> [Chest](radiating to) -> [Arm]
-> (with laterality) -> [Left]
[Pain] -
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991• Rector, Nolan and Glowinski, SCAMC, 1993• Campbell and Musen, SCAMC, 1993• Lindberg, Humphreys, McCray, Methods 1993
Lindberg, Humphreys, McCray, Methods 1993
• Unified Medical Language System
Lexical groupString
String
Concept
String
String
Lexical group
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991• Rector, Nolan and Glowinski, SCAMC, 1993• Campbell and Musen, SCAMC, 1993• Lindberg, Humphreys, McCray, Methods 1993• Rocha, Huff, et al., CBM, 1994
Rocha, Huff, et al., CBM, 1994
• VOSER
• A server architecture for managing terminologic knowledege
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991• Rector, Nolan and Glowinski, SCAMC, 1993• Campbell and Musen, SCAMC, 1993• Lindberg, Humphreys, McCray, Methods 1993• Rocha, Huff, et al., CBM, 1994• Campbell, Cohn, Chute, et al., SCAMC 1996
Campbell, Cohn, Chute, et al., SCAMC 1996
• Convergent Medical Terminology
• SNOMED/Kaiser/Mayo
• Galapagos
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991• Rector, Nolan and Glowinski, SCAMC, 1993• Campbell and Musen, SCAMC, 1993• Lindberg, Humphreys, McCray, Methods 1993• Rocha, Huff, et al., CBM, 1994• Campbell, Cohn, Chute, et al., SCAMC 1996• Brown, O’Neil and Price, Methods, 1997
Brown, O’Neil and Price, Methods, 1997
• Read Codes
• Representation with GALEN model
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991• Rector, Nolan and Glowinski, SCAMC, 1993• Campbell and Musen, SCAMC, 1993• Lindberg, Humphreys, McCray, Methods 1993• Rocha, Huff, et al., CBM, 1994• Campbell, Cohn, Chute, et al., SCAMC 1996• Brown, O’Neil and Price, Methods, 1997• Spackman, Campbell, and Côte, SCAMC 1997
Spackman, Campbell, and Côte, SCAMC 1997
• SNOMED RT (Reference Terminology)
• Convergent Medical Terminology
• Description Logic Format
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991• Rector, Nolan and Glowinski, SCAMC, 1993• Campbell and Musen, SCAMC, 1993• Lindberg, Humphreys, McCray, Methods 1993• Rocha, Huff, et al., CBM, 1994• Campbell, Cohn, Chute, et al., SCAMC 1996• Brown, O’Neil and Price, Methods, 1997• Spackman, Campbell, and Côte, SCAMC 1997• Huff, Rocha, McDonald, et al., JAMIA 1998
Huff, Rocha, McDonald, et al., JAMIA 1998
• Logical Observations, Identfiers, Names and Codes (LOINC)
4764-5 | GLUCOSE^3H POST 100 G GLUCOSE PO | SCNC | PT | SER/PLAS | QN|
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991• Rector, Nolan and Glowinski, SCAMC, 1993• Campbell and Musen, SCAMC, 1993• Lindberg, Humphreys, McCray, Methods 1993• Rocha, Huff, et al., CBM, 1994• Campbell, Cohn, Chute, et al., SCAMC 1996• Brown, O’Neil and Price, Methods, 1997• Spackman, Campbell, and Côte, SCAMC 1997• Huff, Rocha, McDonald, et al., JAMIA 1998• Pharmacy system knowledge base vendors
Pharmacy System Knowledge Base Vendors
Manufactured Components
Country-Specific Packaged Product
Ingredient
Ingredient Class
is-a
Drug Class
is-a
Not-Fully-Specified Drug
is-a
Clinical Drug
is-a
Trademark Drug
is-a
International Package Identifiers
is-a
is-a
Composite Trademark Drug
Composite Clinical Drug
is-a
Knowledge-Based Terminology Efforts
• Jochen Bernauer, SCAMC, 1991• Rector, Nolan and Glowinski, SCAMC, 1993• Campbell and Musen, SCAMC, 1993• Lindberg, Humphreys, McCray, Methods 1993• Rocha, Huff, et al., CBM, 1994• Campbell, Cohn, Chute, et al., SCAMC 1996• Brown, O’Neil and Price, Methods, 1997• Spackman, Campbell, and Côte, SCAMC 1997• Huff, Rocha, McDonald, et al., JAMIA 1998• Pharmacy system knowledge base vendors
Medical Entities Dictionary (MED)
• New York Presbyterian Hospital• 60,000 concepts (procs, results, drugs, probs)• 208,242 synonyms• 84,677 hierarchical links• 113,906 semantic links• 238,040 other attributes• 66,404 translations (ICD9-CM, LOINC, MeSH,
UMLS)
Central Controlled Terminology
MED Data Structures
• Semantic network
MED Semantic Network
MedicalEntity
PlasmaGlucose
LaboratorySpecimen
PlasmaSpecimen
AnatomicSubstance
Plasma Substance
Sampled
Part of
Has S
pecimen
Substance Measured
LaboratoryProcedure
CHEM-7
LaboratoryTest
Event
DiagnosticProcedure
Substance
BioactiveSubstance
Glucose
Chemical
Carbo-hydrate
MED Data Structures
• Semantic network
• MUMPS global
MED MUMPS Global^med(1600) <SERUM GLUCOSE MEASUREMENT>^med(1600,1) <C0202041> . . ,4) <32703,50000> . . ,5) <> . . ,6) <Serum Glucose Measurement> . . ,7) <> . . ,8) <1724> . . ,12) <GLUC> . . ,14) <169> . . ,16) <31987> . . ,17) <mg/dl> . . ,20) <C000006> . . ,23) <1178> . . ,50) <Serum Glucose> . . ,138) <40444,40445,40446,59165> . . ,156) <MCNC> . . ,161) <QN>
MED Data Structures
• Semantic network
• MUMPS global
• DB2
MED DB2 Tables
1234
Entities
10 Name 20 UMLS 30 Part-of
40 Specimen
Slots
1 102 102 202 30
Entity-Slots1 10 Entity2 10 C00012 40 12342 50 mg/dl
Entity/Slot/Values1 11 21 32 3
Ancestry
MED Data Structures
• Semantic network
• MUMPS global
• DB2
• Unix
MED UNIX Data Structure
1600|SERUM GLUCOSE MEASUREMENT |1|C020241|4|32703|4|50000|12|GLUC|17|mg/dl|........
MED Data Structures
• Semantic network
• MUMPS global
• DB2
• UNIX
Proof of Concepts
• Merging data and application knowledge
Merging Data and Application Knowledge
Plasma Glucose Test
Serum Glucose TestFingerstick Glucose Test
Lab Test
Intravascular Glucose Test Chem20 Display
Lab Display
• Class-based, reusable lab summaries
DOP Summary
WebCIS Summary
Merging Data and Application Knowledge
Plasma Glucose Test
Serum Glucose TestFingerstick Glucose Test
Lab Test
Intravascular Glucose Test Chem20 Display
Lab Display
• Class-based, reusable lab summaries
• Expert system for application maintenance
Proof of Concepts
• Merging data and application knowledge• Smarter retrievals from the record
Smarter Retrievals from the Record
• Repository stores events and results
• Clinical problems at a different level of granularity
• Re-use knowledge to map from problems to clinical data
• Produce problem-specific views of the medical record
Chest X rayCongestive
Heart Failure
Intravascular CK Test
CreatineKinase
Chest X ray 2 View
Cardiac Enzyme
Angina
Lab :1/1/99 Lab :1/1/99 Cardiac Enzyme TestCardiac Enzyme Test
Radiology :2/23/99 Radiology :2/23/99 Chest X RayChest X Ray
Radiology :2/28/96 Radiology :2/28/96 Head CTHead CT
Lab :12/28/96 Lab :12/28/96 Sickle Cell TestSickle Cell Test
Admission :3/14/96 Admission :3/14/96 StrokeStroke
Admission :2/14/98Admission :2/14/98AnginaAngina
Lab :1/1/99 Lab :1/1/99 Blood Type TestBlood Type Test
Radiology :2/1/97 Radiology :2/1/97 Knee X RayKnee X Ray
Concept-oriented(Heart)
Heart Disease
Chest
Discharge :1/15/99Discharge :1/15/99 CHFCHF
Discharge :1/15/99Discharge :1/15/99 CHFCHF
Admission :2/14/98Admission :2/14/98AnginaAngina
Lab :1/1/99 Lab :1/1/99 Cardiac Enzyme TestCardiac Enzyme Test
Radiology :2/23/99 Radiology :2/23/99 Chest X RayChest X Ray
Proof of Concepts
• Merging data and application knowledge
• Smarter retrievals from the record• “Just-in-Time” education
“Just-in-time” Education
• Medline button
• Infobuttons
“Just-in-time” Education
• Medline button
• Infobuttons
• Text-to-Web
• Medline button
• Infobuttons
• Text-to-Web
“Just-in-time” EducationDXplain
Medline
CholesterolGuideline
DietaryInteractions
PDR
Micromedex
Clinical InfoSystem
Webpath
CHORUS
Radiol Museumof South Bank
LaboratoryTest Results
MedicationOrders
X-rayReports
ICD9
Proof of Concepts
• Merging data and application knowledge
• Smarter retrievals from the record
• “Just-in-Time” education• Expert systems
Expert Systems
• Hripcsak, et al., Ann. Int. Med., 1995
Hripcsak, et al., Ann. Int. Med., 1995
• Identify chest x-ray reports suspicious for 6 clinical conditions to trigger alerts
Method Sens SpecLaypersons 22-47% 97-99%Radiologists 73-98% 96-99%Internists 68-98% 97-99%Keyword 51-79% 79-92%NLP/MED/Rule-based 81% 98%
Expert Systems
• Hripcsak, et al., Ann. Int. Med., 1995
• Clinical decision support system
Clinical Decision Support System
• Data monitor runs rules against incoming reports
• Tuberculosis cultures come back 4-8 weeks later
• One day, hundreds of TB alerts came in
What Happened to the Tuberculosis Alert?
No Growth
Medical Logic Module
No Growth to Date
No Growth after ...
How We Outsmarted the Lab
No Growth
No Growth after 48 Hours
No Growth after 72 Hours
“No Growth” Results
No Growth after 24 Hours
No Growth to Date
Medical Logic Module
Expert Systems
• Hripcsak, et al., Ann. Int. Med., 1995
• Clinical decision support system
• DXplain Button
DXplain Button
• Elhanan, et al., SCAMC 1997
• Convert of test results to clinical findings
• Pass findings to DXplain
CholesterolHypercholesterolemia
Abnormalities ofSerum Cholesterol
Serum
Serum Specimen
Serum Cholesterol Test
Expert Systems
• Hripcsak, et al., Ann. Int. Med., 1995
• Clinical decision support system
• DXplain Button
Proof of Concepts
• Merging data and application knowledge
• Smarter retrievals from the record
• “Just-in-Time” education
• Expert systems• Data mining
Data Mining
• Wilcox and Hripcsak, SCAMC 1997
Wilcox and Hripcsak, SCAMC 1997
Data Mining
• Wilcox and Hripcsak, SCAMC 1997
• Wilcox and Hripcsak, SCAMC 1998
• Compare traditional coding methods with NLP to identify conditions in a set of patient records (x-ray reports)
Method Sens SpecLaypersons 36% 86%Expert-coded cases 27-37% 95-98%ICD-9-coded cases 12-29% 86-90%Physicians 85% 98%NLP/MED/Rule-based 81% 98%
Wilcox and Hripcsak, SCAMC 1998
Data Mining
• Wilcox and Hripcsak, SCAMC 1997
• Wilcox and Hripcsak, SCAMC 1998
Proof of Concepts
• Merging data and application knowledge
• Smarter retrievals from the record
• “Just-in-Time” education
• Expert systems
• Data mining• Database maintenance and use
Database Maintenance and Use
• Tables, columns, events all modeled in the MED
• Allows linkage of data model to controlled terminology
• Terminologies can be reused
• Impact of terminology changes on data model can be tracked
Proof of Concepts
• Merging data and application knowledge
• Smarter retrievals from the record
• “Just-in-Time” education
• Expert systems
• Data mining
• Database maintenance and use• Terminology maintenance and use
Terminology Maintenance and Use
• Integrating terminologies from merging hospitals
• Automated update of medication terminology
• Detection of errors and inconsistencies
Proof of Concepts
• Merging data and application knowledge
• Smarter retrievals from the record
• “Just-in-Time” education
• Expert systems
• Data mining
• Database maintenance and use
• Terminology maintenance and use
Is it Worth the Trouble?
Meed:
• noun
• 1 archaic : an earned reward or wage
• 2 : a fitting return or recompense
• Date: before 12th century
• Etymology: from Old English:
MED
Summary
• Putting knowledge in your terminology gets you:– Better ways to get knowledge out of your EMR– Better ways to get knowledge out of resources– Better ways to use other knowledge bases– Bettter ways to use terminology– Better ways to manage applications– Better ways to manage data and terminology
• Representation scheme is less important
• Desiderata for controlled terminology
Desiderata
•Desirable qualities for terminology
Desiderata
•Desirable qualities for terminology
“Go placidly amid the noise and haste, and remember what peace there may be in silence.”
“I’d rather be sailing”