Upload
manuel-de-la-villa
View
795
Download
5
Embed Size (px)
Citation preview
Manuel de la VillaManuel Millán
Alejandro MuñozManuel J. Maña
1
This work has been partially funded by the Spanish Ministry of Science and Innovation and the European Union from the ERDF (TIN2009-14057-C03-03)
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile DevicesMain index
2
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
3
Main index
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Medical staff mobility “Hospitals staff might be distributed in space or time and their
information needs are highly dependent on contextual conditions.“ (Muñoz et al, 2003)
“…PDA use by health professionals shows an evolution in the use ranging from 30% in 2000 to 60% in 2006” (Garrity and El Emam, 2006)
the available resources accessible from the PDA at the bedside provided response to 86% of clinical questions, most of them (88.9% - 97.7%) during the rounds of visits (Hauser et al. 2007)
Introduction
4
Motivation
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Evidence-based medicine “Evidence based medicine is the conscientious, explicit, and
judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic Research. “(Sackett et Al., 1996)
IntroductionMotivation
Is useful an IRS to locate the best available external clinical evidence?
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Wi-fi
Efficient access
PDA
Best evidence
Point –of-care
Mobility
Information overload
CLUstering on Mobile MEdical Devices
IntroductionMotivation
YAIRS? Yet Another IRS? Novelty?
Post-retrieval clustering, orientation to biomedical documentsand mobile devices
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile DevicesIntroduction
7
Post-retrieval clustering is a known tecnique that improve theorganization of the search results and facilitate navigation betweenthem.
Previous experiences on Search Engines using clustering:Univesity Carnegie-Mellon -> Vivisimo -> Clusty
Interesting? No? And if I tell you that on May Yippy has paid $5.5M for Clusty…?
Post-retrieval clustering?
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile DevicesIntroduction
Post-retrieval clustering on Biomedicine?
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
9
Index
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Queryrepresentation
Analysis
Textprocessing
Documentsrepresentation
DocumentsRepository
RelevantDocuments
Similaritycalculation
Informationneeded
Information Retrieval System
Given a set of documents and an information need, the goal of IR is to obtain thedocuments relevant to that need, sort by any criteria and show them to the user.
1. Indexing
2. Query
3. Searching
4. Evaluation
5. Ranking
Search and Information RetrievalA tipical Schema
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Document sources: Biomed Central (web crawling in progress)
Text Processing: lowercasing, stemming, stop-words ,…
Search and Information RetrievalOur implementation
Lucene for indexing…
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile DevicesSearch and Information Retrieval
Our implementation (and II)
… and Lucene for searching
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
13
Index
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Clustering
The post-processing clustering is to associate, according to their similarity, a set of documents retrieved from a query in different subsets
ClusteringOur implementation
14
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Clustering algorithm:
Simple-K-Means vs Expectation Maximization
Algorithms
Querys (Documents)Simple-K-means EM
Ligaments (10) 1 2
Cancer Skin (25) 4 12
Cancer (46) 5 26
Disease (62) 8 57
Time it takes to perform the grouping in seconds
K? It depends on the number of documents retrieved.
ClusteringWhy Simple-K-Means?
15
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
16
Index
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Visualization on Mobile Devices
Restrictions:
Compatibility problems Screen size Multiples windows lack Limited navigation Limited memory size Javascript, cookies Accesibility etc.
Solutions:
World Wide Web Consortium (W3C): Mobile Web Initiative. "The Mobile Web Initiative’s goal is to make browsing the Web from mobile devices a reality, to improve Web content production and access for mobile users” (Tim Berners-
Lee, W3C Director)
Mobile Web Best Practice 1.0.
Visualization on Mobile DevicesTypical problems
17
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
One webmaking, as far as is reasonable, the same information and services available to users irrespective of the device they are using
Trust in web standardsHTML compatible with different browsers, Use Stylesheet, Content in blocks (<DIV>).
Avoid known risksNo pop-ups, No frames, No tables
Controlling limitationsNo scripting, Standards fonts, Use of color
Optimized navigationMinimal navigation at the top of the pageAvoid lengthy URI’s
Probe images and coloursReduced image size and resolution, good contrast
Do it smallSmall pages, only one-direction (vertical) scrolling, easy entry forms (reduced keytyping)
Limited use of networkNo external links (images…), no download
Think in usersSimple language, relevant and limited content, error messages
And many more…!!!18
Visualization on Mobile DevicesSome best practices
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
19
Visualization on Mobile DevicesHigh resolution interface
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
20
Visualization on Mobile DevicesLow resolution interface
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
21
Visualization on Mobile DevicesChecking ClusterMed
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Our proposal obteins at his first version a 76%
Visualization on Mobile DevicesChecking our proposal
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Cancer skin
23
Visualization on Mobile DevicesOur interface
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
24
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Conclusions
Objectives Fulfilled
It works!!!
First Milestone
A working prototype of an Information retrieval
System adapted for Medical Devices based on Post-
retrieval clustering
25
Conclusions and future workConclusions
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
A lot of…
Focus on using medical ontologies like UMLS
Metathesarus for:
Improve the clustering quality
Enhance the labelling of the groups
Visual help, graph with concepts/cluster relationship
26
… and future work:
Conclusions and future workFuture work
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
… and future work (and II):
Monodocument Summarization
Freebase Ajax connection for search support
Bilingual (Spanish-English documents)
Put into production (crawling and indexing…) any sponsor at the hall???
27
Conclusions and future workFuture work (and II)
BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010
A Biomedical Information Retrieval System based on Clustering for Mobile Devices
Manuel de la VillaManuel J. Maña{manuel.villa, manuel.mana}@dti.uhu.es
Manuel MillánAlejandro Muñoz{manuel.millan, alejandro.munoz}@alu.uhu.es
28
This work has been partially funded by the Spanish Ministry of Science and Innovation and the European Union from the ERDF (TIN2009-14057-C03-03)