Upload
audrey-sanders
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
From free text to clinical data
Language and Computing
Davide Zaccagnini, MDKaren Doyle, RNOctober 23, 2007
Outline
• Reality of Applying NLP to AHLTA documents
• Use Cases
• Ontology-Based NLP
Use Cases
• PRIMARY Use Case for Health Care Documentation compared with documentation produced for Biomedical Research
– Collect information to determine diagnosis (ses) and execute a plan of treatment and communicate with healthcare team.
• By-products of Electronic Documentation– Coding for Billing – Problem Lists– Past Medical History– Social History; 14 Elements tobacco use ETOH, toxin exposure, marital
status – Family History– Medications – Allergies– Bio-surveillance– Quality Metrics; Pay for Performance, Joint Commission, HEDIS– Research
AHLTA offers Structured Documentation Tool
Medcin Terms in Blue
Structured and Unstructured Text DoD HA Policy Guidance
Ref ASAD Health Affairs August 7, 2007
Blue is the original code calculated based on the structured documentation. Pinks are the how the Doctor can change the subscores,. But the document does not change.
Background of TATRC HPI Free Text DUMMY
• Lost Data in S/O sections: What is the value?• Patient History
– Patient’s “story”, reflects signs and symptoms – History of Present Illness – Review of Systems:– Past Family, Social and Medical History– Used to calculate Evaluation and Management (E&M)
Billing Codes• HPI: History of Present Illness
– Definition: A chronological description of the present illness from the first sign or symptom, or from last encounter
– Comprised of 8 Elements used in the calculation of E&M code
Location, quality, severity, duration, timing, context, modifying factors, associated signs and symptoms
(HPI Dummy # 1) Free text Section Extracted manually
for Analysis
100 Texts for Processing
Free Text to Data: What is desirable?
• HPI 1 45yo G4P4, POD14 s/p TAH, doing well. Denies f/c. Denies any pain. Not taking any pain meds. Staples removed on 9May. Appetite good. No N/V. Normal bowel/bladder function. She is very happy with the outcome of surgery. Only concern is incision -very small area that has not healed completely. has been keeping the incision clean and dry.
• Expand Abbreviations• Codify Terms to
Vocabularies ICD 9 SNOMED, MEDCIN
• Negation• Modality• Applying Rules
– Financial Billing – Obtain; age, height,
weight, blood pressure, dates
– Quality Metrics – Surveillance – History, Family, Past
Medical, Current Problems?
Free Text Example
Expand Abbreviations Code to Vocabularies
Evaluate for Negation
Apply Rules
appetite good
good
very
f/cn/v
TAHpain
happy
taking pain meds
negation
Ontology-based NLP
Natural Language Processing and Understanding
“…..natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.”
Wikipedia
DATA MODELS ONTOLOGY
FORMALLY DEFINED OF CONCEPTS:
• NO PREDEF. USE
• REALITY DRIVEN
• NO PREDEF. CONTEXT
• INFERRED MODEL
AGREED UPONTERMS:
• PREDEF. USE
• DATA DRIVEN
• PREDEF. CONTEXT
• SPECIALIZED MODEL
Representations (formal or otherwise)
What is fever?
All definitions are accurate within their model, but what is fever?
does the patient have fever?
ID# ZIP code BP001123 02139 80/120
001223 24425 65/130
patientidentifier
geographical area
blood pressure
ID#
The world according to a databasePatients {ID#, ZIP code, BP}
The world according to an ontology patient
has (identifier (is_a (ID#)) ∩ lives_in (geographic_area) ∩ has (blood_pressure (is_measured_by (blood pressure measurement(…)))
blood pressure measurement
value
80/120
is_a
is_identifed_by has is_measured_by generates
is_a is_a
65/130
lives in
ZIP codeis_identifed_by
Formal representations
Ontologies:the meaning of data
An ontology:• Explicitly specifies meaning• Represents reality, not data• Is a formal schema• Its consistency can be automatically
enforced and checked
NLP Workflow
• Example Pipeline
Input handler
Paragrapher
Segmenter
Section labeler
Syntactic parser
Fragment labeler
Lexeme filter
Vital signs extractor
Labs extractor
FreePharma
Disambiguator
Coder
Concept filters
Relevance ranker
Output handler
Negation/modality
-> Assigns fragment labels to pieces of text within sections
-> Filters out function words (e.g. determiners) to reduce false mapping positives
-> Identifies negation, modality and future
-> Extracts vital signs
-> Extracts lab results
-> Extracts medications
-> Disambiguates concepts
-> Codes to standard classification systems like SNOMED-CT, ICD-9,…
-> Fetches document and pass to first processing component
-> Paragraph and title detection
-> Maps tokens and multi-words to ontology. Rewriting to enhance mapping
-> Assigns section labels to paragraphs
-> Performs syntactic parsing validating against grammar
-> Marks concepts that belong to different filters (e.g. diagnoses, procedures)
-> Calculates relevance of concepts
-> Creates XML/HTML/… output
Semantic tagger -> Further deduces concepts based on syntax, rewriting, full definitions and so on
Semantic Tagging
Concept: SNOMED CT : 29074008 : POLYP OF ANTRUM (DISORDER)
Sample: “Demonstrated benign small polyps in the antrum”
antrum > antralpolyp < polypsMorphological Variations:
antral polyp ; polyp antralWord Clustering:
maxillary sinus polyp, antral polypKnown Synonyms:
Types of Disambiguation
by STRING: lexical match between a term, (or it’s inflections) and a concept in the ontology.
fever
symptom
cough
Ex.: “Patient presents fever”
by DEFINITION: match between terms and concepts in the ontology, where these concepts meet necessary and sufficient conditions (logic-based reasoning)
Ex.: “Patient underwent a liver biopsy”
true true
has_location (liver) Λ is_a (biopsy)
procedureorgan
liver biopsy
liver biopsy =
Types of Disambiguation
by RELATIONSHIPS: match between SOME of the term(s), assigned to different concepts in the ontology, where these concepts compose the full definition of the concept using a ‘suggested parent’.
Ex.: “CT of thyroid”
true true
is_a (CT scan) Λ has_location (thyroid)
neckCT thyroid
CT of Neck
?
has_location
is_a (CT scan) Λ has_location (neck)
true
=
=
is_a
Types of Disambiguation
Examples of disambiguation
Ontology and NLP
LinKBase®
MedicalOntology
Spanish
English
Lexicon Grammar Proprietary
ICD-9
MEDCIN
SNOMED CT
CPT
Radlex (partial)
concepts are mapped to terms in multiple languages
Cross-mapped to multiple coding systems
Natural language processing Terminologies and data integration
Conclusion
• Ontologies are powerful NLP tools for:• Segmentation• Disambiguation• Higher level inference• Interoperability of extracted data• Requires human resources for maintenance,
but reduce the need for annotated data
• They are “white boxes”• Models that can be expanded and changed
• Combined with stochastic algorithms, they provide both formality and scalability
Thank you
“Patients in the North East have higher blood pressure than the average population”
patientidentifier
geographical area
blood pressure
ID#
blood pressure measurement
value
80/120
is_a
is_identifed_byhas
is_measured_bygenerates
is_a is_a
65/130
lives in
ZIP codeis_identifed_by
NLP/U, formal representations
Disambiguation
• Words in document are mapped to concepts in the ontology
• When more than one candidate exist in the ontology, it builds a graph of concept relations using:1. Nearness in sentence2. IS_A Relationships
3. Horizontal relationships
Syntactic Parsing
«A very young patient was given a double dose by his mother.»
The subject.
The predicate
Note passiveconstruction
Negation via Syntax
Modality via Syntax
Reference Resolution
“TeSSI” understands indirect reference to patient
The system is able to disambiguate between two different meanings of “depressed” in one and the same sentence. While it defines the “depressed” in “depressed patient” as a state of mind, it recognizes “depressed” as a part of “depressed fracture” and tags this noun phrase with the corresponding SNOMED code.
Disambiguation
Fragment Labeling
• Sentences and phrases are labeled• History, exam, impression, etc.
• Independent of superficial formatting
• One label – one type of information
“HPI: The patient whose mother had breast cancer presents with loss of hearing”
Family History
Chief Complaint
Fragment Labeling
FreePharma
. Medication Extraction• Example
Semantic Indexing
Input handler
Paragrapher
Segmenter
Disambiguator
Relevance ranker
Indexer
-> Disambiguate concepts
-> Fetch document and pass to first processing component
-> Paragraph and title detection
-> Map tokens and multi-words to ontology
-> Calculate relevance of concepts
TeSSI : Terminology Supported Semantic Indexing
-> Write information to index for quick access.
Information Extraction
Input handler
Paragrapher
Segmenter
Section labeler
Syntactic parser
Fragment labeler
Vital signs extractor
Labs extractor
FreePharma
Output handler
Negation/modality
-> Assign fragment labels to pieces of text within sections
-> Identify negation, modality and future
-> Extract vital signs
-> Extract lab results
-> Extract medications
-> Fetch document and pass to first processing component
-> Paragraph and title detection
-> Assign section labels to paragraphs
-> Perform syntactic parsing validating against grammar
-> Create XML/HTML/… output
Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on
-> Map tokens and multi-words to ontology
Knowledge Discovery
Input handler
Paragrapher
Segmenter
Section labeler
Syntactic parser
Fragment labeler
Vital signs extractor
Labs extractor
FreePharma
Ontology writer
Negation/modality
-> Assign fragment labels to pieces of text within sections
-> Identify negation, modality and future
-> Extract vital signs
-> Extract lab results
-> Extract medications
-> Fetch document and pass to first processing component
-> Paragraph and title detection
-> Assign section labels to paragraphs
-> Perform syntactic parsing validating against grammar
-> Add discovered knowledge to onology
Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on
-> Map tokens and multi-words to ontology
Rules Engine -> Xml structured rules for interpreting syntactic structure and forming semantic represenations
Automatic coding
Input handler
Paragrapher
Segmenter
Section labeler
Syntactic parser
Fragment labeler
Vital signs extractor
Labs extractor
FreePharma
Negation/modality
-> Assign fragment labels to pieces of text within sections
-> Identify negation, modality and future
-> Extract vital signs
-> Extract lab results
-> Extract medications
-> Fetch document and pass to first processing component
-> Paragraph and title detection
-> Assign section labels to paragraphs
-> Perform syntactic parsing validating against grammar
Semantic tagger -> Further deduce concepts based on syntax, rewriting, full definitions and so on
-> Map tokens and multi-words to ontology
Rules Engine -> Xml structured rules for interpreting syntactic structure and forming semantic represenations
Code Calculator -> Code calculator: e&M, ICD-9, CPT
Output handler -> Create XML/HTML/… output
NLP-based applications and products
44 44
Quality
Projects:CPR TechnologiesJCAHOEclipsys
• Extraction of CMS Core Measures• National Patient Safety Network• Datawarehousing
45 45
Coding
Projects:Kaiser PermanenteConvergent Solutions
• E&M Coding• SNOMED Coding• ICD-9 Coding• CPT in development
46 46
Medication Extraction
Projects:The Marshfield ClinicMedquistUAB
• Medication Reconcilation• Personalized Medication Project• Validation of therapies from literature
47 47
Interoperability
Projects:Integic/DoDRevolution Health
• Semantic Integration of the military health systems
• Tie together free text content and portal applications
48 48
Web Search and Retrieval
Projects:Revolution HealthMerck
• Ontolgy enhanced search • Concept based indexing
49 49
Radiology
Projects:FUJIFILM MEDICAL SYSTEMS
• Findings and pertinent negatives extracted from radiology reports
Radiology
• Observation Types• Findings• Pertinent Negatives• Quality Assurance• Unclassified
• Observation Components• Fundamentals• Modifiers• Qualifiers
• Observation Status• (Present) / Historical• Changed/Not Changed/(not stated)
Observation Types
• Findings• E.g. “bilateral infiltrates”
• Pertinent Negatives• E.g. “the lungs are clear”
• Quality Assurance• E.g. “poor inspiration”
• Unclassified• E.g. “the lungs are unchanged”
Observation Components
• Fundamentals• Pathologic Entities• Physiologic entities• Devices• Procedure
• Modifiers• Location• Qualitative• Quantitative
• Uncertainty (modal)• Negation
Observation Status
• Historical• (non-Historical)• Change Stated• No Change Stated• (Change not stated)• Grouped• Contains Uncertain (modal) Element
Example PN and F (Modal)
Example Hx and Grouped
Example CS and NCS
Example Quality Assurance
Modifier in long distance dependency
Finding of PE in
historical context
Finding of devices
Findings
A knowledge that lungs should be
clear
negation of abnormalities
statement of normality
Pertinent Negatives