Upload
sunil-paudel
View
127
Download
4
Embed Size (px)
Citation preview
AUTOMATIC EXTRACTION OF SIDE EFFECT
INFORMATION FROM CONSUMER DRUG
REVIEWS
SUPERVISED BY:
Assoc Prof Khoo Soo Guan, Christopher
Wee Kim Wee School of Communication and Information
20 April, 2015
PRESENTED BY:
Abdul Rachman(G1400808F)
Paudel Sunil(G1400834A)
Sathasivamoorthy Nirathan(G1301369K)
Introduction
• Text mining and information extraction from the reviews of social media
(www.webmd.com).
• Extracting side effect information of psychotropic drugs.
• Psychotropic drugs alter the chemical levels in the brain and impact the behavior,
emotions and the mood.
• In past, pharmacy used to provide the side effects based on the clinical trials.
• These days, trusted health sites (like www.fda.gov) provide the list of probable side
effects.
• Sometimes, user might experience side effects not mentioned in the label of the medicine.
Reviews from www.webmd.com
Objectives
• Objectives:
• To develop an information extraction method to extract the side effect information from
online drug reviews (www.webmd.com)
• To compare the extracted side effects with the ones listed in www.fda.gov
Information extraction method
• Side effect information : awful headache
• Pattern : the only side effect has been ____________________
Information extraction method
• Side effect Information : shaking, restlessness and dizziness
• Pattern : side effects are _______________
Information extraction method
• Side effect information : nausea (typo error by the user) – pain area in text mining
• Pattern : _________ is a side effect
Information extraction method
• Side effect extracted by the proposed method:
Till full stop for the information after the pattern
From the beginning of the sentence for the information before the pattern.
Overall approach for constructing extraction patterns
• To construct a set of good patterns (accurate and good coverage) – candidate patterns
Good coverage: pattern must occur several times (more than 2)
Accuracy: more than 60%
Overall approach for constructing extraction patterns
• Generation of N-grams: ranging from 3 to 6
• For this study: we investigate only 1 seed word, which is “side effect”
Extraction Method
• Side effect information extracted using the generated patterns
• Patterns are matched with the reviews and side effects are extracted using automation
method
Challenges Faced
• Extraction of negative information
Challenges Faced
• User don’t follow proper structure in writing
Analysis of Extracted information
• Total No of Patterns: 505
• Total No of Reviews: 801
• Total No of Side Effect information Retrieved: 63
• Total No of relevant side effect information retrieved: 50
• Total No of relevant side effect information available: 71
Precision, Recall and F1 measure
• 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =Total number of Relevant Side Effects Information Retrieved
Total number of Side Effects Information Retrieved∗ 100
=50
63∗ 100 = = 79.37%
• 𝑅𝑒𝑐𝑎𝑙𝑙 =Total number of Relevant Side Effects Information Retrieved
Total number of Relevant Side Effects Information 𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒∗ 100
=50
71∗ 100 = 70.42%
• F1 = 2 ∗precision .recall
precision+recall
= 2 ∗79.37
70.42∗ 100 = 74.63%
Error Analysis
• 21 relevant side effects were missed
• Reasons:
Use of free writing (1)
Pattern construction not possible (2)
In training data sample, accuracy was less than 60% (3)
1
2
3
Error Analysis
• 13 non-relevant side effect information extracted
• Reason:
Even good patterns might extract few bad information
• All these patterns accuracy was above 60% in training sample
Comparison of Side Effects
• Extracted side effects of 15 drugs compared with those listed in www.fda.gov
• Drugs Selection Criteria:
Minimum 30 reviews in training sample
• Few complained side effects are similar in meaning
Comparison of Side Effects
• Few of the extracted side effects not mentioned in the list at all
Conclusion & Future Work
• Thus, the side effects were extracted using the candidate patterns
• Extracted side effects were compared with those of www.fda.gov and found few of them
are not listed in the site
• The extracted information contains lot of noise; future work to be done to extract only the
side effects leaving the noise behind.
• Use of other seed words like downside, bad news, symptom, ill effect etc. to increase the
accuracy of the end results.
References
• Cheng, V. C., Leung, C. H., Liu, J., & Milani, A. (2014). Probabilistic Aspect Mining Model for
Drug Reviews. Knowledge and Data Engineering, IEEE Transactions on, 26(8), 2002-2013.
• Gaizauskas, R., & Wilks, Y. (1998). Information extraction: Beyond document retrieval. Journal of
documentation, 54(1), 70-105.
• Grishman, R. (1997). Information extraction: Techniques and challenges. InInformation extraction
a multidisciplinary approach to an emerging information technology (pp. 10-27). Springer Berlin
Heidelberg.
• Khoo, C. S. G., Chan, S., Niu, Y., & Ang, A. (1999). A method for extracting causal knowledge
from textual databases.
• Nahm, U. Y., & Mooney, R. J. (2002, March). Text mining with information extraction. In AAAI
2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases (Vol. 1).
Thank You !!!
Q & A