33
NCBI, NLM, NIH May 11 2012 Ritu Khare Drexel University College of Medicine Philadelphia, PA 1 Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Embed Size (px)

Citation preview

Page 1: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

NCBI, NLM, NIH May 11 2012

Ritu Khare

Drexel University College of Medicine

Philadelphia, PA

1

Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Page 2: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Presentation Order

2

1. Motivation A flexible EHR

2. Form Understanding Structure Discovery Form Annotation

3. Contributions and Plans

Page 3: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Clinicians & Electronic Health Records(EHRs)

3

Clinician

Electronic Health

Records

IT professionals and vendors

Inconsistent (Gurses et al. ,2009)

Inflexible(Gurses et al. ,2009, An et al. 2009)

Unintended consequences (Ash et al. 2004, Lee 2007, Harrison et al. 2007)

Data Collection Needs

Integration of New Needs

Overall Workflow

Page 4: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

The flexible Electronic Health Record (fEHR) Form-based approach (using “forms” as design artifacts)

4

1. clinicians' high familiarity quotient on forms

2. rich information embedded in forms to guide DB design

I want to collect patient’s information, personal and vital signs, etc

EHR Database

Clinician

The fEHR System

Form Design

(or Import) Interface

Form

Understanding and Mapping

Page 5: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

The flexible EHR: Key Challenges

5

EHR Database

The fEHR System

Form Design

Interface

Form Understanding

Form Mapping

Clinician

1 2 3

Usability Information Extraction Schema & Data Integration

Structure Discovery Form Annotation

Page 6: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Presentation Order

6

1. Motivation A flexible EHR

2. Form Understanding Form Structure Discovery Hidden Markov Models

Form Annotation 3. Contributions and Plans

Page 7: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Structure Discovery

7

The form tree accurately captures the contextual associations among the form elements. (Dragut et al. 2009, Wu et al. 2009)

A Clinical Form The Corresponding Form Tree

:text label :format :value

Page 8: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Challenges of Automatic Structure Discovery

8

Designed for human

understanding Visual arrangements Past experiences

For a machine, form is an unstructured

document Source code contains only

presentation/formatting structure

Existing Approaches (Zhang et al., 2004 and He et al., 2004) Short search forms Rules and heuristics

Page 9: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Analysis of the Form Design Process

9

Elements and their sequence: Visible

Medical decision segment

Assessment segment Orders segment

Category label

field

format

Misc. text Subcategory label

subfield

field

format

Misc. text

subformat

Demographics segment

Segment boundaries and roles: Hidden and arbitrarily laid out

Form design process can be modeled using Hidden Markov Models.

Page 10: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Using Hidden Markov Model(HMM)

10

HMM: A finite state automaton with stochastic

state transitions and symbol emissions (Rabiner, 1989)

Used to model and decode the real world processes which are implicit and unobservable

2-layered HMM T-HMM: assigns tags to

elements, e.g., category, field, format, etc.

S-HMM: creates groups of contextually related elements.

HMM-based artificial designer

T-HMM

category

field

format

category S-HMM

Page 11: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Inner Functionality of the 2-layered HMM

11

text text area

category field format Misc-text

Begin-segment

End-segment/ End sub segment

Begin sub segment

Inside segment

Parser

T-HMM

S-HMM

text area

text text text text text text checkbox

sub-category

field Misc-text

format field format

Inside sub segment

Algorithms

Supervised Training: Expectation Maximization

Testing: Viterbi

Page 12: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Tree Generation Overall Approach

12

Page 13: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Datasets (52 forms from 6 medical institutions)

13

Dataset Avg. #Text Avg. #Inputs

1 Walk in clinic encounter forms (3 forms)

32.33 49.33

2 Nursing patient admission forms (6 forms)

17.17 33

3 OB/GYN forms (7 forms) 16.14 37.29

4 Adult visit encounter forms (18 forms)

47.83 65.22

5 Family practice forms (13 forms)

82.61 100.46

6 Child visit encounter forms (5 forms)

53 67.4

Home-grown interface

Home-grown DIY interface that captures designer’s on-the-fly intentions

HMM Training Data

T-HMM and S-HMM state sequences for each form

Gold Benchmark

For result evaluation: 52 Gold Std Trees

T-HMM: category, field, format, category, field, format, …

S-HMM: begin, inside, eng, begin, inside, end,…

Page 14: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Results: Tree Extraction (Structure Discovery) Accuracy

14

An average tree with 135 edges gets generated in 0.08 seconds.

Dataset1 Dataset2 Dataset3 Dataset4 Dataset5 Dataset6

Total Tree Edges

272 362 461 2606 2674 644

Accuracy 95.22% 97.51% 100% 97.58% 98.46% 96.11%

HMM Testing

Cross-validation leave 1 out method

Conclusions

HMMs are very effective for structure discovery

Subsume existing approaches

Page 15: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Presentation Order

15

1. Motivation A flexible EHR

2. Form Understanding Form Structure Discovery Hidden Markov Models

Form Annotation Bayesian Classifier

3. Contributions and Plans

Page 16: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Form Annotation Semantic Heterogeneity across clinical data sources (Halevy, 2005,

Henry et al. 1993, Hernandez et al. 2005, Wright et al., 1999)

16

Diastolic/Systolic

Medical Record Number Med Rec #

BP

MRN

Blood Pressure

Constitutional Vital Signs Physical Status

?

Controlled Medical Vocabularies should be involved in the design artifacts of the healthcare systems. (Jean et al., 2007, Sugumaran and Storey, 2002)

Form Template (Design Artifact) EHR Database

fEHR

Page 17: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Form Annotation

The Systematized Nomenclature of Medicine - Clinical Terms (Intl. Health Terminology Stds. Dev. Org)

Most comprehensive clinical vocabulary (SNOMED CT User Guide, 2009).

>360,000 logically-defined clinical concepts (Hina et al., 2010, Stenzhorn et al., 2009).

SNOMED CT Clinical Encounter Form

17

Form Term

Patient

MRN

SNOMED CT Concept

11615400: Patient (person)

398225001: Medical record number (observable entity)

Page 18: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

SNOMED CT Concepts

concept id: 0231832

18

concept id: 362508001

Fully-specified-name: Respiratory Rate (Observable Entity)

Fully-specified-name: Both eyes, entire (Body Structure)

Preferred Term: Respiratory Rate

Synonym: Respiration Frequency

Preferred Term: Both eyes, entire

Synonym: OU- Both eyes

SNOMED CT

Semantic Categories

•Attribute

•Body Structure

•Disorder

•Finding

•Observable Entity

•Occupation

•Person

•Physical Object

•Procedure

•Racial Group

•Situation

•…

Page 19: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

SNOMED CT Browsers: (Rogers and Bodenreider, 2008) Existing Annotation Services

General Search Category Specific Search

19

Page 20: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Form Annotation Challenges Diversity Challenge

Different clinicians - different terms MRN, Med. Rec.# Vital signs, Constitutional, Physical

status

Context Challenge Same Form Term - Different

Concepts.

20

Page 21: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Solution Premises

The key is to identify the SNOMED CT semantic category appropriate for a given term.

The first, i.e., the most string-similar, result retrieved by the category-specific search is usually the desired concept.

How to automatically determine the SNOMED CT Semantic Category appropriate for a given form term ? ?

21

Page 22: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Naïve Bayes Classifier Based on the Bayes theorem (Han

and Kamber 2006).

Class Labels (SNOMED CT semantic categories ) attribute, body structure, disorder,

… Classification Features (local

structure) Node type Parent node type Child node Type Parent Semantic Category Grandparent Semantic Category

22

The implicit relationship between the term context (i.e., the form tree) and the desired semantic category can be formally captured into a STATISTICAL MODEL.

root

Patient Examination

Name Gender Respiratory

M F nl perc.

Person Procedure

Observable Entity

Observable Entity

Qualifier Value Qualifier Value

Observable Entity

Finding

Page 23: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Classification Model

Category Membership Probabilities

Structure Analyzer

Features

Form Annotation Algorithm and Implementation

23

SNOMED CT Category Specific Search (API)

Form Term

Form Tree

SNOMED CT Concept

Training Data

Category Picker

Semantic Category

root

Patient Examination

Name Gender Respiratory

Person Procedure

Observable Entity

Observable Entity Observable

Entity

Manual Annotations

Hybrid = Contextual Structure + Linguistics

Page 24: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Data

24

Manual (Gold) Annotations

Total 4235 form terms were manually studied and 2506 (59%) had corresponding SNOMED CT concept

Some Unmapped Terms

no scleral icterus

chronic back pain

Follow up with PCP

Sent to ER

Term Concept ID

Patient 11615400: Patient (person)

MRN 398225001: Medical record number (observable entity)

… ……………….

Dataset Avg. # Terms

SNOMED CT Mappability

1 Walk in clinic encounter forms (3 forms)

32.33 75.77 %

2 Nursing patient admission forms (6 forms)

17.17 63.98%

3 Labor & delivery DB data-entry forms (7 forms)

16.14 58.8 %

4 Adult visit encounter forms (18 forms)

47.83 56.2%

5 Family practice forms (13 forms)

82.61 59.38%

6 Child visit encounter forms (5 forms)

53 62.21%

Page 25: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Experiment Design

25

Baseline (linguistics only)

Goal: To study whether…

structure can improve annotation performance.

Measures

Precision # correct annotations/# annotations

Recall # correct annotations/# gold annotations

Classification Model

Category Membership Probabilities

Structure Analyzer

Features

SNOMED CT Category Specific Search

Form Term

SNOMED CT Concept

Category Picker

Semantic Category

SNOMED CT General Search

Form Term

SNOMED CT Concept

Hybrid (linguistics + structure)

Classification Model

Category Membership Probabilities

Structure Analyzer

Features

SNOMED CT Category

Specific Search

Form Term

SNOMED CT Concept

Category Picker

+candidate set expansion

Semantic Category

Hybrid++

Page 26: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Results

Baseline: p=0.60, r= 0.46 Baseline to Hybrid

Precision improved 26% Hybrid to Hybrid++

Precision improved 13% Recall improved 17%

Hybrid++: p=0.86, r= 0.60 (F-score = 0.71)

Term processing component

remove special characters (-, #, /,)

acronym expansion BTL (Bilateral Tubal Litigation)

VTE (Venous Thromboembolism)

Precision only slightly improved (3-5%) Recall improved majorly (25%) Final p= 0.89, r = 0.76 (F-score =0.82)

26

Annotation duration /form = 1- 11 s

Implications

Contextual structure improves the overall annotation performance

Linguistics only influence the recall

Page 27: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Presentation Order

27

1. Motivation A flexible EHR

2. Form Understanding Form Structure Discovery Hidden Markov Models

Form Annotation Bayesian Classifier

3. Contributions and Plans

Page 28: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Summary: Clinical Form Understanding

1. Structure Discovery

2. SNOMED CT Annotation

28

Hidden Markov Models High accuracy( 97.85%)

Limitations Supervised learning Weak entities, and other constraints Advanced form features

Naïve Bayes Classifier 0.89 (precision) and 0.76 (recall) Structure helps improve annotation

43% precision, 29% recall

• Limitations • Supervised learning • Leverage limited semantics from

SNOMED CT

Related Publications: CIKM 2009, SIGMOD Record 2010, IHI 2010, ER 2011, IHI 2012

cx1 cx2

cy1

cy2

Page 29: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Application: the flexible EHR

29

The fEHR System

Design or Import Form

Form Understanding

Mapping Algorithms

Clinician

1 2 3

EHR Database

• Discover Semantic Correspondences

• Evolve Existing Database

Experiments

52 forms (from 6 clinics) generate 6 databases (35-450)

Annotation helps improve the integration process (database quality by 13%, merging scenario identification by 19%)

Page 30: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Other Applications

Structure Discovery SNOMED CT Annotation

30

Web Search Form Understanding

Deep Web Visibility Meta-search Engines

Used on any domain Movies, health, automobile, … Biological Forms

Clinical form-driven database design process.

Database elements are named after form terms

To prepare databases for future integration.

Page 31: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Current and Future Projects Improving Form Annotation Unstructured EHR/Web data

31

Involve expert annotator to prepare gold standards

Specialty specific forms OB/GYN

Use other UMLS terminologies

Post coordinated mapping

Extract structure from narrative data visit notes, discharge summaries

Error control algorithms

A Typical Patient Visit Note (created by physician)

Page 32: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Acknowledgements Computer and Information Scientists

Physicians and Clinical Researchers

32

Dr Yuan An Dr Tony Hu Dr Jason Li Dr Min Song Dr Il-Yeol Song Dr Christopher Yang

Dr Prudence Dalrymple Dr Kalatu Davies Dr Michele Follen Dr Sandra Hartmann Dr Paul Nyirjesy Dr Sandra Wolf

Page 33: Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation

Thank you

33