22
Automating the formalization of clinical guidelines using information extraction: an overview of recent lexical approaches 05 August 2011 Phil Gooch Centre for Health Informatics City University, London UK

Automating the formalization of clinical guidelines using information extraction

Embed Size (px)

DESCRIPTION

Formalizing guideline text into a computable model, and linking clinical terms and recommendations in clinical guidelines to concepts in the electronic patient record (EHR) is difficult as, typically, both the guideline text and EHR content may be ambiguous, inconsistent and make use of implicit and background medical knowledge. How can lexical-based IE approaches help to automate this task? In this presentation, various design patterns are discussed and some tools presented.

Citation preview

Page 1: Automating the formalization of clinical guidelines using information extraction

Automating the formalization of clinicalguidelines using information extraction:an overview of recent lexical approaches

05 August 2011

Phil GoochCentre for Health InformaticsCity University, London UK

Page 2: Automating the formalization of clinical guidelines using information extraction

Clinical guidelines

• Contain recommendations for best practice based on systematic

reviews of clinical evidence, consensus statements and expert opinion.

• Goal is to reduce variation in medical care by promoting the most

effective treatments, and to provide a means of quality control in clinical

practice via audit

• Produced by a variety of organizations (e.g. NICE, RCP, SIGN) in a

variety of document formats usually not conducive to use at the point of

care.

Page 3: Automating the formalization of clinical guidelines using information extraction

Clinical decision support (CDS)

• Aims to provide diagnostic and treatment recommendations and

advice at the point of care, i.e. information tailored for the specific

patient under consideration by the clinician during a consultation

• CDS systems require a knowledge base (KB), usually derived from

guidelines, consisting of declarative knowledge (penicillin is-a

antibiotic) and procedural (if…then) rules, and some sort of electronic

patient record system (EPR)

Page 4: Automating the formalization of clinical guidelines using information extraction

Computer-interpretable guidelines

• Early systems ‘computerized’ guidelines by making them available ‘on

the computer’, e.g. as HTML or PDF

• Did not lead to improved guideline compliance or use!

• To standardize the format of the knowledge-base, ease development

of CDS, and to improve guideline use at the point of care, a number of

formalisms for representing guidelines have been developed

Page 5: Automating the formalization of clinical guidelines using information extraction

Computer-interpretable guidelines (CIGs)

Rule-based: ‘if ... then’, e.g. Arden Syntax for individual clinical decisionsLET Last_HgA1C BE READ LATEST {"HgA1C Value"};LET Diabetic_Patient BE READ LATEST {"Problem: Diabetes"};if Diabetic_Patient and Last_HgA1C Occurred not within past 6 months and Last_HgA1C is less

than or equal 7then conclude true;

Document based, e.g. GEM, for complete guideline documents in XML

OO expression query languages e.g. GELLO:observation.code == ‘SBP’ AND observation.value > 140 AND assessment.code ==‘LVF’

Task-network models (TNM), e.g. GLIF, Asbru, PROforma, for workflow-likemodelling of decisions over time

Page 6: Automating the formalization of clinical guidelines using information extraction

Formalization of guidelines into a CIG model

• Declarative: Mapping clinical concepts in the guideline to terms within a

controlled vocabulary (e.g. UMLS) or ‘virtual medical record’

• Procedural: Identification and extraction of eligibility criteria, clinical

actions (tests, treatment regimes, referrals), temporal constraints and

if…then decision rules

• Translation to a formal model, e.g. PROforma, GLIF, Asbru

• Time-consuming, iterative, manual process as the guideline text tends to

assume background knowledge, is incomplete or contains ambiguity and

vague terms

Page 7: Automating the formalization of clinical guidelines using information extraction

Example CIG fragment (Asbru)

<plan name="Doxycycline : 100 mg orally twice a day for 7 days"plan_id="plan52769441"> <cyclical_plan plan_id="plan5675512"> <frequency value="12" unit="hour"/> </cyclical_plan> <duration> <min value="7" unit="day"/> <max value="7" unit="day"/> </duration></plan>

Page 8: Automating the formalization of clinical guidelines using information extraction

Examples of vague guideline statements

Underspecification:• Avoid the use of highly intensive management strategies to achieve

an HbA1c level less than 6.5% (48 mmol/mol)

• Monitor HbA1c every 2–6 months (according to individual need) until itis stable on unchanging treatment

Qualitative terms requiring mapping to numeric values or ranges:• The moderate use of alcohol may increase HDL-cholesterol

• If blood pressure remains uncontrolled on adequate doses of threedrugs, consider adding a fourth and/or seeking expert advice

Page 9: Automating the formalization of clinical guidelines using information extraction

Information extraction for guideline formalization

• Helpful to automate

• Knowledge base construction: text to formal model translation

• Identification of opportunities for decision support: mapping

guideline concepts and rules to concepts in the EPR

• Measurement of guideline compliance

Page 10: Automating the formalization of clinical guidelines using information extraction

Information extraction approaches

• Bottom-up: identification of individual clinical terms, temporalexpressions, units of measure• Look-up lists, regular expressions• Shallow parsing to identify noun phrases• Terminology services: UMLS, MetaMap• Co-reference resolution: WordNet

• Top-down: identification of guideline structure: preamble, eligibility,recommendations, ‘action’ sentences and rules• Shallow parsing to identify verb phrases• Ontologies for semantic relations, e.g. UMLS Semantic Network• Use of linguistic guideline patterns (see later)

Page 11: Automating the formalization of clinical guidelines using information extraction

Mapping text to UMLS concepts - problems

• Identification of clinical terms is dependent on context:

- family history of congestive heart failure

- probable diagnosis of congestive heart failure

- no evidence of congestive heart failure

- patient does not have established cardiovascular disease

• Clearly just identifying the raw concepts congestive heart failure and

cardiovascular disease and mapping them to UMLS terms is

inadequate.

Page 12: Automating the formalization of clinical guidelines using information extraction

Mapping guideline text to UMLS concepts - problems

• Guideline documents are typically large (100 pages), in PDF or XML

format

• Requires guideline text to be segmented to enable efficient processing

- How best to segment the text that maximizes contextual clinical concept

identification?

Page 13: Automating the formalization of clinical guidelines using information extraction

Solutions: Text segmentation

• Customised phrase chunker to identify candidate terms:

- Noun phrases (NP), prepositional phrases (PP), verb phrases (VP)

- Neoclassical combining forms phrases (Token groups containingLatin/Greek prefixes, roots, suffixes)

- Past-participle and gerund NPs:- 'results in increased blood pressure', 'fasting blood glucose'

- List expansion:- 'mild, moderate and severe hypertension → mild hypertension,

moderate hypertension and severe hypertension'- 'lowering of heart rate and blood pressure → lowering of heart

rate and lowering of blood pressure'- Abbreviation expansion: 'waist circumference (WC)'

Page 14: Automating the formalization of clinical guidelines using information extraction

Solutions: GATE-MetaMap Server integration plugin

- Extracts clinical concepts, in context, from large guideline texts in

multiple formats and encodings (PDF, XML, RTF, ASCII, UTF-8)

- Exchanges data/annotations with a MetaMap server

- Implements Unicode Normalization Forms for UTF-8 → ASCII

- Provides flexible text chunking options

- Optimises input data to MetaMap for mapping to UMLS concepts

- Integrates with other information extraction pipelines

Page 15: Automating the formalization of clinical guidelines using information extraction

GATE-MetaMap integration module

Page 16: Automating the formalization of clinical guidelines using information extraction

Guideline patterns

Serban et al. (2007), examples:

(med_context, target_group, recommendation_operator, med_action)

In the event of [pregnancy]med_context, [patients with diabetes]target_group

[should]recommendation_op be[prescribed calcium channel blocker]med_action

(target_group, med_context, med_goal)

For [diabetic patients]target_group with [kidney damage]med_context the [blood

pressure target is130/80]med_goal

Page 17: Automating the formalization of clinical guidelines using information extraction

Extracting guideline recommendations

Page 18: Automating the formalization of clinical guidelines using information extraction

Extracting guideline recommendations

… and rules from guideline text

Page 19: Automating the formalization of clinical guidelines using information extraction

Information extraction from patient data

Page 20: Automating the formalization of clinical guidelines using information extraction

Patient data: automatic spelling correction

Page 21: Automating the formalization of clinical guidelines using information extraction

Patient data: automatic spelling correction

Page 22: Automating the formalization of clinical guidelines using information extraction

Patient data: WordNet mappings for coreferencing