Upload
diego-molla-aliod
View
96
Download
1
Embed Size (px)
DESCRIPTION
Slides of the opening presentation at the Macquarie University workshop on Text Mining and Health, http://comp.mq.edu.au/research/collaboration-workshops/2014-mq-clinical-nlp/
Citation preview
Macquarie University Workshop on Text Miningand Health
Diego Molla
Macquarie University,Sydney, Australia
http://comp.mq.edu.au/research/collaboration-workshops/2014-mq-clinical-nlp/
26 September 2014
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Contents
1 About the Workshop
2 Text Mining for Evidence Based MedicineThe Scenario
3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering
4 In Progress / Future Research
Text Mining and Health 2014 Diego Molla 2/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Aims of the Workshop
Bring together
Medical researchers andpractitioners
Researchers in text miningand related areas
Why?
Find ideas for collaboration
Text Mining and Health 2014 Diego Molla 3/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Some Statistics
Registered: 50+Presentations: 13 + 1
Institutions represented
1 Macquarie University
2 IBM Research
3 The University of Melbourne
4 Defense Science andTechnology Organisation
5 The University of Queensland
6 RMIT University
7 Monash University
8 Royal Melbourne Hospital
9 Alfred Health
10 Queensland University ofTechnology
11 The Commonwealth Scientificand Industrial ResearchOrganisation
12 Semantic Software Asia Pacific
13 The University of New SouthWales
14 Bond UniversityText Mining and Health 2014 Diego Molla 4/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
ProgramTime Session
8:45 – 9:00 Registration
9:00 – 9:30 Diego MollaIntroduction and research ideas — Text Mining for Evidence Based Medicine
9:30 – 10:30 Session 1 (6 presentations)Antonio Jimeno: Text analytics for Healthcare at IBM Research — AustraliaKarin Verspoor: Syndromic Surveillance from Emergency Department triage notesTudor Groza: Phenotype concept recognition: State of the art and future directionsSimon Kocbek: Topic modeling of Emergency Department Triage notes for characterising pain-relatedchief complaintsLawrence Cavedon: Text mining for lung cancer cases over large patient admission dataReza Haffari: Intelligent Analysis of Health Record Data
10:30 – 10:45 Break
10:45 – 11:55 Session 2 (7 presentations)Guido Zuccon: Towards Exploiting Inference from Semantic Annotations for Medical InformationRetrievalLaurianne Sitbon: Delivering Clinical Information Extraction Tools to PractitionersDung Xuan Thi Le: A Transformation of Free Text to Semantic Data for Analysis PurposesMark Johnson: Extracting and Exploiting Relational Information in Text Data MiningGuy Tsafnat: Agent-based evidence gathering, synthesis and disseminationMiew Keen Choong: Automatic clinical evidence discovery with citation networksAdam Dunn: Automatic classification of published clinical articles using metadata instead of content
11:55 – 12:30 Discussion and closing
Text Mining and Health 2014 Diego Molla 5/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Thanks to . . .
Department of Computing
Centre for Language Sciences (CLaS)
. . . you all!
Text Mining and Health 2014 Diego Molla 6/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Contents
1 About the Workshop
2 Text Mining for Evidence Based MedicineThe Scenario
3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering
4 In Progress / Future Research
Text Mining and Health 2014 Diego Molla 7/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Contents
1 About the Workshop
2 Text Mining for Evidence Based MedicineThe Scenario
3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering
4 In Progress / Future Research
Text Mining and Health 2014 Diego Molla 8/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Evidence Based Medicine
http://laikaspoetnik.wordpress.com/2009/04/04/evidence-based-medicine-the-facebook-of-medicine/
Text Mining and Health 2014 Diego Molla 9/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
The Search Space is Huge
Text Mining and Health 2014 Diego Molla 10/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Suggested Steps in EBM
http://hlwiki.slais.ubc.ca/index.php?title=Five_steps_of_EBM
Text Mining and Health 2014 Diego Molla 11/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Where can Research in Text Processing Help?Questions:
Help formulateanswerable questions.Question analysis andclassification.
Search:
Retrieve and rankrelevant literature.Extract theevidence-basedinformation.Summarise the results.
Appraisal: Classify theevidence.
Text Mining and Health 2014 Diego Molla 12/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Where can Research in Text Processing Help?Questions:
Help formulateanswerable questions.Question analysis andclassification.
Search:
Retrieve and rankrelevant literature.Extract theevidence-basedinformation.Summarise the results.
Appraisal: Classify theevidence.
Text Mining and Health 2014 Diego Molla 12/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Where can Research in Text Processing Help?Questions:
Help formulateanswerable questions.Question analysis andclassification.
Search:
Retrieve and rankrelevant literature.Extract theevidence-basedinformation.Summarise the results.
Appraisal: Classify theevidence.
Text Mining and Health 2014 Diego Molla 12/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Contents
1 About the Workshop
2 Text Mining for Evidence Based MedicineThe Scenario
3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering
4 In Progress / Future Research
Text Mining and Health 2014 Diego Molla 13/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Contents
1 About the Workshop
2 Text Mining for Evidence Based MedicineThe Scenario
3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering
4 In Progress / Future Research
Text Mining and Health 2014 Diego Molla 14/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Journal of Family Practice’s “Clinical Inquiries”
Text Mining and Health 2014 Diego Molla 15/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Components of the Corpus
Question Direct extract from the source.
Answer Split from the source and manually checked.
Evidence Extracted from the source.
Additional text Manually extracted from the source and massaged.
References PMID looked up in PubMed (automatic and manualprocedure).
Text Mining and Health 2014 Diego Molla 16/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Corpus Statistics
Size
456 questions (“records”).
1,396 answer parts (“snips”).
3,036 answer justifications (“longs”).
3,705 references:
2,908 unique references.2,657 XML abstracts from PubMed.
Text Mining and Health 2014 Diego Molla 17/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Our Vision
Which treatments work best for hemorrhoids?
(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoids
(SOR A) Hemorrhoidectomy is the best treatment forprolapsed internal hemorrhoids
(SOR A) Rubber band ligation produces the lowest level ofrecurrence among nonoperative techniques
Text Mining and Health 2014 Diego Molla 18/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Our Vision
Which treatments work best for hemorrhoids?
(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoids
(SOR A) Hemorrhoidectomy is the best treatment forprolapsed internal hemorrhoids
(SOR A) Rubber band ligation produces the lowest level ofrecurrence among nonoperative techniques
Text Mining and Health 2014 Diego Molla 18/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Contents
1 About the Workshop
2 Text Mining for Evidence Based MedicineThe Scenario
3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering
4 In Progress / Future Research
Text Mining and Health 2014 Diego Molla 19/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Single-document Query-based Summarisation
Input
Which treatments work best for hemorrhoids?
Abstract of Greenspon J, Williams SB, Young HA ,et al. Thrombosedexternal hemorrhoids: outcome after conservative or surgicalmanagement. Dis Colon Rectum. 2004; 47: 1493-1498.
OutputA retrospective study of 231 patients treated conservatively or surgically foundthat the 48.5% of patients treated surgically had a lower recurrence rate thanthe conservative group (number needed to treat [NNT]=2 for recurrence atmean follow-up of 7.6 months) and earlier resolution of symptoms (average 3.9days compared with 24 days for conservative treatment).
Text Mining and Health 2014 Diego Molla 20/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Extractive Summarisation by Sarker et al. (CBMS 2012)
Input
Which treatments work best for hemorrhoids?
Abstract of Greenspon J, Williams SB, Young HA ,et al. Thrombosedexternal hemorrhoids: outcome after conservative or surgicalmanagement. Dis Colon Rectum. 2004; 47: 1493-1498.
OutputThe aim was to test the efficacy of local application of nifedipine ointment in healing acute thrombosed externalhemorrhoids.Results obtained were as follows: complete relief of pain in 43 patients (86 percent) of the nifedipine-treated groupas opposed to 24 patients (50 percent) of the control group after 7 days of therapy (P < 0.01); oral analgesicswere used by 4 patients (8 percent) in the nifedipine-treated group as opposed to 26 patients (54.1 percent) of thecontrol group after 7 days of therapy (P < 0.01); and resolution of acute thrombosed external hemorrhoids wasachieved after 14 days of therapy in 46 patients (92 percent) of the nifedipine-treated group, as opposed to 22patients (45.8 percent) of the control group (P < 0.01).Our study clearly demonstrates that the use of topical nifedipine, which at present is for treatment ofcardiovascular disorders, is a reliable new option in the conservative treatment of thrombosed external hemorrhoids.
Text Mining and Health 2014 Diego Molla 21/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
General Approach (Sarker et al., CBMS 2012)
In a Nutshell1 Gather statistics from the best 3-sentence extracts.
Exhaustive search to find these best extracts.Used ROUGE to automatically compare the extracts with thetarget output.
2 Build three classifiers, one per sentence in the final extract.
Classifier 1 based on statistics from best 1st sentence.Classifier 2 based on statistics from best 2nd sentence.Classifier 3 based on statistics from best 3rd sentence.
Text Mining and Health 2014 Diego Molla 22/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Results
System F-Score 95% CI Percentile (%)
L3 0.159 0.155–0.163 60.3O3 0.161 0.158–0.165 77.5R 0.158 0.154–0.161 50.3O 0.159 0.155–0.164 60.3PI 0.160 0.157–0.164 69.4
PD 0.166 0.162–0.170 97.3
L3=Last three sentences. O3=Last three PIBOSO outcome sentences.R=Random. O=All outcome sentences. PI=Sentence position independent.PD=Sentence position dependent (our proposal).
Text Mining and Health 2014 Diego Molla 23/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Contents
1 About the Workshop
2 Text Mining for Evidence Based MedicineThe Scenario
3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering
4 In Progress / Future Research
Text Mining and Health 2014 Diego Molla 24/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
The ALTA 2011 Shared Task
The ALTA Shared Tasks
Competitions where all participantsare evaluated on the same data.
The ALTA 2011 shared task wasbased on evidence grading.
The Data
Clusters of abstracts.
The SOR grade of each cluster.
The SORT Taxonomy
A Consistent and good-qualitypatient-oriented evidence.
B Inconsistent or limited-qualitypatient-oriented evidence.
C Consensus, usual practise, opinion,disease-oriented evidence, or caseseries for studies of diagnosis,treatment, prevention, orscreening.
Data Fragment41711 B 10553790 15265350
53581 C 12804123 16026213 14627885
53583 B 15213586
52401 A 15329425 9058342 11279767
Text Mining and Health 2014 Diego Molla 25/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Cascaded Classification (Molla & Sarker, ALTA 2011)
Process: Cascaded SVMs1 Default class: B.
2 SVMs with abstract n-grams to identify A and C.
3 SVMs with publication types to identify A and C.
4 SVMs with title n-grams to identify A and C.
Results
Method Accuracy C I
Majority (B) 48.63% 41.5 – 55.83Cascaded SVMs 62.84%
http://corine13.c.o.pic.centerblog.net/h7f1xcsu.jpg
Text Mining and Health 2014 Diego Molla 26/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Contents
1 About the Workshop
2 Text Mining for Evidence Based MedicineThe Scenario
3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering
4 In Progress / Future Research
Text Mining and Health 2014 Diego Molla 27/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Clustering for EBM Summarisation
InputQUESTION:Which treatments workbest for hemorrhoids?
DOCUMENTS:[11289288] [12972967][1442682] [15486746][16235372] [16252313][17054255] [17380367]
clustering
=⇒
Output
1 [11289288] [12972967][15486746]
2 [17054255] [17380367]
3 [1442682] [16252313][16235372]
Text Mining and Health 2014 Diego Molla 28/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Clustering Approach (Shash & Molla 2013)
K -means(non-overlappingclustering).
Unigram-basedfeatures.
lowercased, stopwords removed,tf.idf ofremainingwords.
Text Mining and Health 2014 Diego Molla 29/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Results
Table 1: Average entropy for optimal K clusters.
UMLS UMLSMeasure Whole XML Abstract only concepts only semantic types
Euclidean 0.260 0.264 0.274 0.310Correlation 0.348 0.362 0.349 0.347Cosine 0.249 0.266 0.277 0.298Dice 0.332 0.328 0.324 0.334Jaccard 0.320 0.330 0.317 0.327Manhattan 0.288 0.299 0.305 0.296
Entropy of pure random clustering is − log2(1/K ) = 1.263.
Text Mining and Health 2014 Diego Molla 30/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Contents
1 About the Workshop
2 Text Mining for Evidence Based MedicineThe Scenario
3 Our ResearchA Corpus for EBM SummarisationSingle-document Query-based SummarisationEvidence GradingClustering
4 In Progress / Future Research
Text Mining and Health 2014 Diego Molla 31/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
In Progress: A Proof-of-Concept System (Michael vanTreeck, Masters of IT) I
Text Mining and Health 2014 Diego Molla 32/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
In Progress: A Proof-of-Concept System (Michael vanTreeck, Masters of IT) II
Text Mining and Health 2014 Diego Molla 33/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
In Progress: Identifying Keywords of the Answer (JiweiGuan, Masters of Research)
Keyword Extraction Techniques
tf.idf
Using Part of Speech
Using information from the answer
. . .
Keyphrase Extraction Techniques
C-Value, NC-Value
Part of Speech Patterns
. . .
Text Mining and Health 2014 Diego Molla 34/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Future Research
Fine-tune search techniques
Incorporate question types
Label the clusters
Combine single summaries
Test with real people
Text Mining and Health 2014 Diego Molla 35/36
About the Workshop Text Mining for Evidence Based Medicine Our Research In Progress / Future Research
Thank You
Questions?
Further information about our research:http://web.science.mq.edu.au/~diego/medicalnlp/
Diego
Text Mining and Health 2014 Diego Molla 36/36