Automation Extraction of Side Effect Information from Consumer drug reviews

AUTOMATIC EXTRACTION OF SIDE EFFECT

INFORMATION FROM CONSUMER DRUG

REVIEWS

SUPERVISED BY:

Assoc Prof Khoo Soo Guan, Christopher

Wee Kim Wee School of Communication and Information

20 April, 2015

PRESENTED BY:

Abdul Rachman(G1400808F)

Paudel Sunil(G1400834A)

Sathasivamoorthy Nirathan(G1301369K)

Introduction

• Text mining and information extraction from the reviews of social media

(www.webmd.com).

• Extracting side effect information of psychotropic drugs.

• Psychotropic drugs alter the chemical levels in the brain and impact the behavior,

emotions and the mood.

• In past, pharmacy used to provide the side effects based on the clinical trials.

• These days, trusted health sites (like www.fda.gov) provide the list of probable side

effects.

• Sometimes, user might experience side effects not mentioned in the label of the medicine.

http://www.fda.gov/

Reviews from www.webmd.com

Objectives

• Objectives:

• To develop an information extraction method to extract the side effect information from

online drug reviews (www.webmd.com)

• To compare the extracted side effects with the ones listed in www.fda.gov

http://www.webmd.com/

http://www.fda.gov/

Information extraction method

• Side effect information : awful headache

• Pattern : the only side effect has been ____________________


• Side effect Information : shaking, restlessness and dizziness

• Pattern : side effects are _______________


• Side effect information : nausea (typo error by the user) – pain area in text mining

• Pattern : _________ is a side effect


• Side effect extracted by the proposed method:

Till full stop for the information after the pattern

From the beginning of the sentence for the information before the pattern.

Overall approach for constructing extraction patterns

• To construct a set of good patterns (accurate and good coverage) – candidate patterns

Good coverage: pattern must occur several times (more than 2)

Accuracy: more than 60%

Overall approach for constructing extraction patterns

• Generation of N-grams: ranging from 3 to 6

• For this study: we investigate only 1 seed word, which is “side effect”

Extraction Method

• Side effect information extracted using the generated patterns

• Patterns are matched with the reviews and side effects are extracted using automation

method

Challenges Faced

• Extraction of negative information

Challenges Faced

• User don’t follow proper structure in writing

Analysis of Extracted information

• Total No of Patterns: 505

• Total No of Reviews: 801

• Total No of Side Effect information Retrieved: 63

• Total No of relevant side effect information retrieved: 50

• Total No of relevant side effect information available: 71

Precision, Recall and F1 measure

• 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =Total number of Relevant Side Effects Information Retrieved

Total number of Side Effects Information Retrieved∗ 100

=50

63∗ 100 = = 79.37%

• 𝑅𝑒𝑐𝑎𝑙𝑙 =Total number of Relevant Side Effects Information Retrieved

Total number of Relevant Side Effects Information 𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒∗ 100

=50

71∗ 100 = 70.42%

• F1 = 2 ∗precision .recall

precision+recall

= 2 ∗79.37

70.42∗ 100 = 74.63%

Error Analysis

• 21 relevant side effects were missed

• Reasons:

Use of free writing (1)

Pattern construction not possible (2)

In training data sample, accuracy was less than 60% (3)

1

2

3

Error Analysis

• 13 non-relevant side effect information extracted

• Reason:

Even good patterns might extract few bad information

• All these patterns accuracy was above 60% in training sample

Comparison of Side Effects

• Extracted side effects of 15 drugs compared with those listed in www.fda.gov

• Drugs Selection Criteria:

Minimum 30 reviews in training sample

• Few complained side effects are similar in meaning

http://www.fda.gov/

Comparison of Side Effects

• Few of the extracted side effects not mentioned in the list at all

Conclusion & Future Work

• Thus, the side effects were extracted using the candidate patterns

• Extracted side effects were compared with those of www.fda.gov and found few of them

are not listed in the site

• The extracted information contains lot of noise; future work to be done to extract only the

side effects leaving the noise behind.

• Use of other seed words like downside, bad news, symptom, ill effect etc. to increase the

accuracy of the end results.

http://www.fda.gov/

References

• Cheng, V. C., Leung, C. H., Liu, J., & Milani, A. (2014). Probabilistic Aspect Mining Model for

Drug Reviews. Knowledge and Data Engineering, IEEE Transactions on, 26(8), 2002-2013.

• Gaizauskas, R., & Wilks, Y. (1998). Information extraction: Beyond document retrieval. Journal of

documentation, 54(1), 70-105.

• Grishman, R. (1997). Information extraction: Techniques and challenges. InInformation extraction

a multidisciplinary approach to an emerging information technology (pp. 10-27). Springer Berlin

Heidelberg.

• Khoo, C. S. G., Chan, S., Niu, Y., & Ang, A. (1999). A method for extracting causal knowledge

from textual databases.

• Nahm, U. Y., & Mooney, R. J. (2002, March). Text mining with information extraction. In AAAI

2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases (Vol. 1).

Thank You !!!

Q & A

Data & Analytics

Automation Extraction of Side Effect Information from Consumer drug reviews