39
Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval Enhanced Cohort Identification and Retrieval S105

Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Tracy Edinger, ND, MS

Oregon Health & Science University

Twitter: #AMIA2017

Evaluation of Clinical Text Segmentation to Facilitate Cohort RetrievalEnhanced Cohort Identification and Retrieval

S105

Page 2: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Co-Authors

Dina Demner-Fushman, MD, PhD (National Library of Medicine)

Aaron Cohen, MD, MS (Oregon Health & Science University)

Steven Bedrick, PhD (Oregon Health & Science University)

William Hersh, MD (Oregon Health & Science University)

2AMIA 2017 | amia.org

Page 3: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Acknowledgements

3AMIA 2017 | amia.org

DMICE Faculty, Staff,

and Students

NLM 2 T15 LM 7088-21

National Library of Medicine OHSU

NLM Scientists, Staff,

and Fellows

Page 4: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Disclosure

I and my spouse/partner have no relevant relationships with commercial

interests to disclose.

4AMIA 2017 | amia.org

Page 5: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Learning Objectives

After participating in this session the learner should be better able to:

• Understand the importance of identifying document section headings for natural language

processing

• Understand rule-based identification of document section headings

5AMIA 2017 | amia.org

Page 6: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Use of Clinical Data

6AMIA 2017 | amia.org

• Secondary use of EHR data

Quality improvement Regulatory reporting

Disease surveillance Research

• To use this data, it is important to be able to retrieve specific patient cohorts

Image from http://epidemiologystudy.com/study.php

Page 7: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Structured and Unstructured Data for Cohort Retrieval

7AMIA 2017 | amia.org

• Structured data including diagnosis and procedure codes are commonly used to identify clinical cohorts

• Relying solely on structured data may not retrieve the full cohort

Denny JC (2012) Chapter 13: Mining Electronic Health Records in the Genomics

Era. PLoS Comput Biol 8(12): e1002823. doi:10.1371/journal.pcbi.1002823

Patients who had colonoscopies during the last 10 years

Page 8: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Cohort Retrieval from Clinical Text

8AMIA 2017 | amia.org

• Cohort retrieval from clinical text is difficult

• Terminology and spelling differences

• Multiple meanings for terms

• Temporality

• Negation

• References to illnesses in other people

• Clinical text may provide clues to help resolve some of these issues

Page 9: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Structure of Clinical Text

9AMIA 2017 | amia.org

S: Patient reports not much sleep last night; no complaints

this morning.

O: T 99 F, HR 68, RR 16, BP 107/75

Chest – CTA, bilateral breath sounds

CV – RRR without murmur

A: Ovarian carcinoma – POD #1 for staging laparotomy.

Adequate UOP, incision in good condition.

P: Clear liquids today. D/C foley catheter.

SOAP Format

Page 10: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Structure of Clinical Text

10AMIA 2017 | amia.org

Chief Complaint: Sent from NWH with left sided hemorrhage

History of Present Illness: The pt is a 44 year-old right handed woman with no significant PMH and family history significant for stroke (father, paternal uncle and sister @ 46 years) who was transferred from [**Hospital 1771**] Hospital with a left sided intraparenchymal hemorrhage. The patient was in her USOH ...

Past Medical History: Had an ulcer at age 10

Social History: Works at the [**Last Name (un) 10457**] Laboratories in [**Location (un) 2997**]. Married. Has a son. No ETOH, TOBACCO, or Drugs.

Family History: Father died of multiple strokes at age 63. Paternal Uncle died of stroke. Patient sister died of stroke at age 46.

Page 11: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Facilitating Retrieval by Segmenting Clinical Text

11AMIA 2017 | amia.org

Past Medical History: Had an ulcer at age 10

Family History: Father died of multiple strokes at age 63. Paternal Uncle died of stroke. Patient sister died of stroke at age 46.

Several algorithms have been published that segment clinical documents

- Segmenting was validated

- No published studies evaluate whether segmenting improves recall and precision

Sections provide clues that may avoid some retrieval issues

- Temporal differences

- References to illnesses in other people

Page 12: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Project Overview

12AMIA 2017 | amia.org

• Segmented a set of clinical documents

• Developed topics for several patient cohorts

• Developed queries with and without sections

• Judged a subset of documents for performance

• Analyzed results

Page 13: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Methods - Data

13AMIA 2017 | amia.org

• MIMIC-II database – neonatal and adult patients

• De-identified ICU records developed by MIT, Philips Medical Systems, and Beth Israel Deaconess Medical Center

• Relational database containing structured data and unstructured documents

25,000 patients

Discharge summaries

MD notes

Radiology reports

Nursing notes

Page 14: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Methods – Segmenting Documents

14AMIA 2017 | amia.org

• Identified section indicators

Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M Service: SURGERY

<allergies>Allergic to penicillin</allergies>Attending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged to

Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M

Service: SURGERY Allergies: PenicillinAttending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged toAdmission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M

Service: SURGERY Allergies - penicillinAttending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged to

Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**] Sex: M

Service: SURGERY Allergic to penicillinAttending:[**First Name3 (LF) 2679**] Addendum: Pt is discharged to

• Searched for indicators and inserted XML tags

Page 15: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Methods – Segmenting Documents

15AMIA 2017 | amia.org

Original format

<TEXT>Admission Date: [**3391-5-21**] Discharge Date:

[**3391-6-1**] Date of Birth: [**3312-11-5**] Sex: M

Service: SURGERY Allergies: Penicillin

Attending:[**First Name3 (LF) 2679**] Addendum: Pt is

discharged to [**Hospital3 **] Hospital [**3391-6-1**].

This is an updated medication list, which has been

faxed to [**Hospital3 **]. Discharge Medications: 1.

Acetaminophen 325 mg Tablet Sig: 1-2 Tablets PO Q6H

(every 6 hours) as needed. 2. Atorvastatin 20 mg Tablet

Sig: One (1) Tablet PO DAILY (Daily). 3. Insulin Lispro

100 unit/mL Solution Sig: One (1) injection

Subcutaneous ASDIR (AS DIRECTED). Discharge

Disposition: Extended Care Facility: [**Hospital6 694**]

– [((Location (un) 695**] [**First Name11 (Name

Pattern1) 531**] [**Last Name (NamePattern1) 2684**]

MD [**MD Number 2685**]</TEXT>

<TEXT>

<preamble>Admission Date: [**3391-5-21**] Discharge Date: [**3391-6-1**]

Date of Birth: [**3312-11-5**] Sex: M Service: SURGERY</preamble>

<allergies>Allergies: Penicillin</allergies>

<addendum>Addendum: Pt is discharged to [**Hospital3 **] Hospital [**3391-

6-1**]. This is an updated medication list, which has been faxed to

[**Hospital3 **]. </addendum>

<dc_meds>Discharge Medications: 1. Acetaminophen 325 mg Tablet Sig: 1-2

Tablets PO Q6H (every 6 hours) as needed. 2. Atorvastatin 20 mg Tablet

Sig: One (1) Tablet PO DAILY (Daily). 3. Insulin Lispro 100 unit/mL

Solution Sig: One (1) injection Subcutaneous ASDIR (AS DIRECTED).

</dc_meds>

<dc_disposition>Discharge Disposition: Extended Care Facility: [**Hospital6

694**] – [((Location (un) 695**] [**First Name11 (Name Pattern1) 531**]

[**Last Name (NamePattern1) 2684**] MD [**MD Number 2685**]

</dc_disposition>

</TEXT>

Segmented text

Page 16: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Methods – Search Engine

16AMIA 2017 | amia.org

NLM’s Essie

• Developed to facilitate searching of medical literature by non-clinicians through use of UMLS

• UMLS relates terms by concept

• Allows matching even if different words used

• Maps text corpus to the UMLS and indexes the corpus on these concepts

• Maps the search concepts to the UMLS

• Returns a ranked, scored list of documents

Page 17: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Methods – Clinical Topics

17AMIA 2017 | amia.org

• Began with topics from TRECMed 2012 and adapted them to the MIMIC ICU data

• Modified or eliminated topics that retrieved few documents

Page 18: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Methods – Clinical Topic Examples

18AMIA 2017 | amia.org

• Patients who develop thrombocytopenia in pregnancy

• Patients taking atypical antipsychotics without a diagnosis of schizophrenia or bipolar depression

• Patients with delirium, hypertension, and tachycardia

• Patients with thyrotoxicosis treated with beta-blockers

• Final set included 22 topics

Page 19: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Methods – Query Development

19AMIA 2017 | amia.org

• Developed initial query without sections

• Ran queries against data

• Examined retrieved documents to refine query

• Rewrote query using sections

• Ran queries against data

• Examined retrieved documents to refine query

• Ran all queries and recorded documents returned and scores

Page 20: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Methods – Query Development

20AMIA 2017 | amia.org

Topic: Patients with diabetes who also have thrombocytosis

• Baseline query

diabetes AND thrombocytosis

• With sections we could avoid Family History

thrombocytosis AND AREA[AdmissionDiagnosis]

diabetes OR AREA[ChiefComplaint] diabetes OR

AREA[Course] diabetes …

Page 21: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Methods – Document Sampling

21AMIA 2017 | amia.org

• Samples selected for each topic based on difference in scores

Total sample size was 574 documents

• Sample sizes ranged from 10 to 40

• Average sample size 26 documents

Segmented

Documents

0-10 docs

Whole

Document

0-10 docs

0-10 high

0-10 low

Page 22: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Methods – Document Evaluation

22AMIA 2017 | amia.org

1. Was the document relevant to the topic?

2. Why were non-relevant documents retrieved?

3. Did segmentation help retrieval and why?

Page 23: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Results – Document Relevance

23AMIA 2017 | amia.org

574 Documents Analyzed

Queries of

Segmented

Documents

Queries of

Whole

Documents

328 22026

Page 24: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Results – Document Relevance

24AMIA 2017 | amia.org

Segmented

Documents

Whole

Document82

Segmented

Documents

Whole

Document246

343 Relevant Documents

231 Non-relevant Documents

20 77

1436

Page 25: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Results – Reasons for Retrieving Non-relevant Documents

25AMIA 2017 | amia.org

Non-relevant reference to condition 84

Past or possible future condition 70

Condition mentioned but not diagnosed 23

Condition denied or ruled out 22

Issue with term mapping 20

Query issue 11

Page 26: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Results – Effect of Segmenting on Document Retrieval

26AMIA 2017 | amia.org

Segmenting avoided retrieval of non-relevant document

by avoiding specific sections132

Segmenting allowed retrieval of relevant document by

focusing on specific sections20

Performance unrelated to segmenting 320

Query error—did not look in the right section 80

Document not segmented correctly 18

Condition included in incorrect section of notes 1

Page 27: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Results

27AMIA 2017 | amia.org

Segmenting avoided retrieval of non-relevant documents

Patients who develop thrombocytopenia in pregnancy

Issue: Neonatal notes often document mother’s

pregnancy history

Solution: Look in sections containing the patient’s

diagnosis

Page 28: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Results

28AMIA 2017 | amia.org

Segmenting allowed retrieval of relevant documents by

focusing on specific sections

Patients taking atypical antipsychotics without a diagnosis

of schizophrenia or bipolar depression

Issue: Need to ignore mentions of these conditions in

family members

Solution: Look in sections containing the patient’s

diagnosis; avoid family-history section

Page 29: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Quantitative Analysis

29AMIA 2017 | amia.org

• Correlation to indicate whether querying the

segmented documents impacted performance

• Precision and recall

Page 30: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Analysis – Matthews Correlation Coefficient

30AMIA 2017 | amia.org

Segmented score

higher than base

Segmented score

lower than base

Document

relevant True Positive False Negative

Document not

relevantFalse Positive True Negative

MCC =

TP x TN – FP x FN

√((TP + FP)(TP + FN)(TN + FP)(TN + FN))

Values range from -1 to 1

Page 31: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Analysis – Matthews Correlation Coefficient

31AMIA 2017 | amia.org

-0.2 0 0.2 0.4 0.6 0.8 1

**** **

******** p<0.05

p<0.01

**Average

Page 32: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Analysis – Recall and Precision

32AMIA 2017 | amia.org

• Recall = Number of relevant documents retrieved

All relevant documents judged

• Precision = Number of relevant documents retrieved

All documents judged

• Values range from 0 to 1

Page 33: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Analysis - Recall

33AMIA 2017 | amia.org

0

0.2

0.4

0.6

0.8

1

Whole Document Segmented Document Avg

Page 34: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Analysis - Precision

34AMIA 2017 | amia.org

0

0.2

0.4

0.6

0.8

1

Whole Document Segmented Document Avg

Page 35: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Discussion

35AMIA 2017 | amia.org

• Queries of segmented documents retrieved fewer

documents

• These documents were more likely to be relevant

and less likely to be non-relevant

• Some queries performed better

• Some documents were easier to segment accurately

Page 36: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Limitations

36AMIA 2017 | amia.org

• Small sample size

• Only one person writing queries and doing relevance

judgments

• Inaccuracies in identifying note segments

• Some queries did not perform well

Page 37: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Future Work

37AMIA 2017 | amia.org

• Use validated algorithm to segment text

• Use larger sample and independent relevance judges

• Develop queries for specific type of clinical note

• Identify specific types of information that benefit from

searching specific sections

• Search unstructured and structured data together to

reflect real-world EHR data use

Page 38: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

@AMIAInformatics

@AMIAinformatics

Official Group of AMIA

@AMIAInformatics

#WhyInformatics

38AMIA 2017 | amia.org

AMIA is the professional home for more

than 5,400 informatics professionals,

representing frontline clinicians,

researchers, public health experts and

educators who bring meaning to data,

manage information and generate new

knowledge across the research and

healthcare enterprise.

Page 39: Evaluation of Clinical Text Segmentation to …...Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Evaluation of Clinical Text Segmentation to Facilitate

Thank you!Email me at:

[email protected]