Extraction of Adverse Drug Effects from Clinical Records

Extraction of Adverse Drug Effects from Clinical Records

E. ARAMAKI* Ph.D., Y. MIURA **,M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D.,H. MASHUICHI ** Ph.D., K.WAKI * Ph.D. M.D.,

K.OHE * Ph.D. M.D., * University of Tokyo, Japan

** Fuji Xerox, Japan

Our material is Discharge Summary

Background• The use of Electronic Health Records (EHR) in

hospitals is increasing rapidly everywhere• They contain much clinical information about

a patient’s health

BUT Many Natural Language texts !

BUT Many Natural Language texts !

Extracting clinical information from the reports is difficult because they are written in natural language

NLP based Adverse Effect Detecting System

• We are developing a NLP system that extracts medical information, especially Adverse Effect, form natural language parts

• INPUT– a medical text (discharge summary)

• OUTPUT– Date Time– Medication Event– Adverse Effect Event

≒ i2b2 MedicationChallenge

But our target focuses only on adverse effect

Adverse Effect Relation (AER)

Why Adverse Effect Relations?

• Clinical trials usually target only a single drug.• BUT: real patients sometimes take multiple

medications, leading to a gap separating the clinical trials and the actual use of drugs

• For ensuring patient safety, it is extremely important to capturing a new/unknown AEs in the early stage.

DEMO is available on

http://mednlp.jp

副作用関係の推定System Demo

Ｃ c

副作用関係の推定System Demo

has no complications at the time of diagnosis 6/23-25 FOLFOX6 2nd.6/24, 25: moderate fever (38℃) again. a fever reducer….

Adverse Effect

Medication

Relation

The point of This Study• (1) Preliminary Investigation: How much information

actually exist? – We annotated adverse effect information in

discharge summaries

• (2) NLP Challenge: Could the current NLP retrieve them?– We investigated the accuracy of with which the

current technique could extract adverse effect information

Outline

• Introduction• Preliminary Investigation

– How much information actually exist in discharge summary?

• NLP Challenge

• Conclusions

Material & Method

• Material: 3,012 Japanese Discharge Summaries• 3 humans annotated possible adverse effects due to

the following 2 steps

<D>Lasix<D> for <S>hypertension</S> is stopped due to <S>his headache</S>.

<D rel=“1”>Lasix<D> for <S>hypertension</S> is stopped due to <S rel=“1”>his headache</S>.

Step 1 Event

Annotation

Step 2Relation

Annotation

XML tag = Event

XML attribute = Relation

Annotation Policy & Process

• We regard only MedDRA/J terms as the events.

• We regarded even a suspicion of an adverse effect as positive data.

• Entire data annotation is time-consuming → We split data into 2 sets SET-A (Event Rich parts): contains keywords such

as Stop, Change, Adverse effect, Side effect

SET-B: The other

adverse effect terminology

Full annotated

Randomly sampled & annotated

14.5%×53.5% + 85.5%×11.3% = 17.4%

SET-BSET-A

Results of Preliminary Investigation

• About 17% discharge summaries contain adverse effect information.– Even considering that the result includes just a

suspicion of effects, the summaries are a valuable resource on AE information.

• We can say that discharge summaries are suitable resources for our purpose.

Outline

• Introduction• Preliminary Investigation

• NLP Challenge– Could the current NLP technique retrieve the AEs?

• Conclusions

Combination of 2 NLP Steps

• 2 NLP steps directly correspond to each annotation step

Lasix for hyperpiesia is stopped due to the pain in the head.

symptom symptomMedication

Adverse Effect Relation

Event Annotation

RelationAnnotation

≒Named Entity Recognition Task

= Relation Extraction Task, which is one of the most hot NLP research topics.

Step1: Event Identification

• Machine Learning Method– CRF (Conditional Random Field) based Named

Entity Recognition

• Feature– Lexicon (Stemming), POS, Dictionary based

feature (MedDRA), window size=5

• Material– SET-A Corpus with Event Annotations

state-of-the-art method ati2b2 de-identification task

Standard Feature Set

Step1: Result of Event Identification

• Result SummaryCat. of Event Precision Recall F-measure

Medication Event 86.99 81.34 0.8485.56 80.24 0.82AE Event

• All accuracies (P, R) >> 80 %, F>0.80, demonstrating the feasibility of our approach

• Considering that the corpus size is small (435 summaries), we can say that the event detection is an easy task

Step2: Relation Extraction Method

• Basic Approach ≒Protein-Protein Interaction (PPI) task [BioNLP2009-shared Task]

• ExampleLasix for hypertension is stopped due to his headache

For each m (Medications) For each a (Adverse Effects) judge_it_has_rel (a, m)For each m (Medications) For each a (Adverse Effects) judge_it_has_rel (a, m)(1) judge_it_has_AER (Lasix , hypetension)(2) judge_it_has_AER (Lasix , headach)

• (1) PTN-BASED: heuristic rules using a set-of-keyword & word distance

..is on ACTOS but stopped for relief of the edema .

n=1<medication> <adverse effect>keyword

n=4

Judge_it_has_AER (m, a, keyword=stopped, windowsize5)

• (2) SVM-BASED: Machine learning approach– Feature: distance & words between two events

( medication & adverse effect)

Two judgment methods

See proceedings for detailed

Step2: Result of Relation ExtractionPrecision Recall F-measure

PTN-BASED 41.1% 91.7% 0.65057.6% 62.3% 0.598SVM-BASED

• Both PTN & SVM accuracies are low (F<0.65)→ the Relation extraction task is difficult!

• SVM accuracy is significant (p=0.05) lower than PTN (1) Corpus size is small (2) positive data << negative data

Machine learning suffers from such small imbalanced data

Outline

• Introduction• Preliminary Investigation• NLP Challenge• Discussions

– (1) Overall Accuracy– (2) Controllable Performance– (3) Event Distribution

• Conclusions

Discussion (1/3) Overall Accuracy

• The overall accuracy is estimated by the combined accuracies of step1 & step2

Overall (= step1 × step2)

Precision 0.289 (=0.855 × 0.869 × 0.390)

• Each NLP step is not perfect, so, the combination of such imperfect results leads to the low accuracy (especially many false positives; low precision)

Recall 0.597 (=0.802 × 0.813 × 0.917)

Discussion (2/3)Performance is Controllable

Precision & Recall curve in SVM

• The performance balance between recall & precision could be controlled

High precision setting

High recall setting

That is a strong advantage of NLP

Discussion (3/3)Event Distribution

• We investigated the entire AE frequency for each medication category.

distribution acquired from annotated real data

distribution acquired from our system results

AE freq. distribution of Drug #1

Discussion (3/3)AER Distribution

• Then, we checked the goodness of the fit test, which measures the similarity between two distributions

Med. 1Med. 2Med. 3Med. 4Med. 5

Total

0.0230.0130.0100.0060.005

0.011

P-value

• High p-value (p=0.011 > 0.01) indicates two distributions are similar.

Outline

• Introduction• Preliminary Investigation• NLP Challenge• Discussions

• Conclusions

Conclusions (1/2)

• Preliminary Investigation:– About 17% discharge summaries contain adverse

effect information.– We can say that discharge summary are suitable

resources for AERs

• NLP Challenge:– Could NLP retrieve the AE information?– Difficult! Overall accuracy is low

Conclusions (2/2)

• BUT: 2 positive findings:(1) We can control the performance balance(2) Even the accuracy is low, the aggregation of the results is similar to the real distribution

• IN THE FUTURE:–A practical system using the above advantages–More acute method for relation extraction

Thank you

Contact Info– Eiji ARAMAKI Ph.D.– University of Tokyo– [email protected]– http://mednlp.jp

mailto:[email protected]

http://mednlp.jp/

Documents

Extraction of Adverse Drug Effects from Clinical Records