28
Manuel de la Villa Manuel Millán Alejandro Muñoz Manuel J. Maña 1 This work has been partially funded by the Spanish Ministry of Science and Innovation and the European Union from the ERDF (TIN2009-14057-C03-03)

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Embed Size (px)

Citation preview

Page 1: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

Manuel de la VillaManuel Millán

Alejandro MuñozManuel J. Maña

1

This work has been partially funded by the Spanish Ministry of Science and Innovation and the European Union from the ERDF (TIN2009-14057-C03-03)

Page 2: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile DevicesMain index

2

Page 3: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

3

Main index

Page 4: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Medical staff mobility “Hospitals staff might be distributed in space or time and their

information needs are highly dependent on contextual conditions.“ (Muñoz et al, 2003)

“…PDA use by health professionals shows an evolution in the use ranging from 30% in 2000 to 60% in 2006” (Garrity and El Emam, 2006)

the available resources accessible from the PDA at the bedside provided response to 86% of clinical questions, most of them (88.9% - 97.7%) during the rounds of visits (Hauser et al. 2007)

Introduction

4

Motivation

Page 5: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Evidence-based medicine “Evidence based medicine is the conscientious, explicit, and

judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic Research. “(Sackett et Al., 1996)

IntroductionMotivation

Is useful an IRS to locate the best available external clinical evidence?

Page 6: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Wi-fi

Efficient access

PDA

Best evidence

Point –of-care

Mobility

Information overload

CLUstering on Mobile MEdical Devices

IntroductionMotivation

YAIRS? Yet Another IRS? Novelty?

Post-retrieval clustering, orientation to biomedical documentsand mobile devices

Page 7: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile DevicesIntroduction

7

Post-retrieval clustering is a known tecnique that improve theorganization of the search results and facilitate navigation betweenthem.

Previous experiences on Search Engines using clustering:Univesity Carnegie-Mellon -> Vivisimo -> Clusty

Interesting? No? And if I tell you that on May Yippy has paid $5.5M for Clusty…?

Post-retrieval clustering?

Page 8: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile DevicesIntroduction

Post-retrieval clustering on Biomedicine?

Page 9: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

9

Index

Page 10: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Queryrepresentation

Analysis

Textprocessing

Documentsrepresentation

DocumentsRepository

RelevantDocuments

Similaritycalculation

Informationneeded

Information Retrieval System

Given a set of documents and an information need, the goal of IR is to obtain thedocuments relevant to that need, sort by any criteria and show them to the user.

1. Indexing

2. Query

3. Searching

4. Evaluation

5. Ranking

Search and Information RetrievalA tipical Schema

Page 11: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Document sources: Biomed Central (web crawling in progress)

Text Processing: lowercasing, stemming, stop-words ,…

Search and Information RetrievalOur implementation

Lucene for indexing…

Page 12: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile DevicesSearch and Information Retrieval

Our implementation (and II)

… and Lucene for searching

Page 13: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

13

Index

Page 14: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Clustering

The post-processing clustering is to associate, according to their similarity, a set of documents retrieved from a query in different subsets

ClusteringOur implementation

14

Page 15: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Clustering algorithm:

Simple-K-Means vs Expectation Maximization

Algorithms

Querys (Documents)Simple-K-means EM

Ligaments (10) 1 2

Cancer Skin (25) 4 12

Cancer (46) 5 26

Disease (62) 8 57

Time it takes to perform the grouping in seconds

K? It depends on the number of documents retrieved.

ClusteringWhy Simple-K-Means?

15

Page 16: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

16

Index

Page 17: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Visualization on Mobile Devices

Restrictions:

Compatibility problems Screen size Multiples windows lack Limited navigation Limited memory size Javascript, cookies Accesibility etc.

Solutions:

World Wide Web Consortium (W3C): Mobile Web Initiative. "The Mobile Web Initiative’s goal is to make browsing the Web from mobile devices a reality, to improve Web content production and access for mobile users” (Tim Berners-

Lee, W3C Director)

Mobile Web Best Practice 1.0.

Visualization on Mobile DevicesTypical problems

17

Page 18: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

One webmaking, as far as is reasonable, the same information and services available to users irrespective of the device they are using

Trust in web standardsHTML compatible with different browsers, Use Stylesheet, Content in blocks (<DIV>).

Avoid known risksNo pop-ups, No frames, No tables

Controlling limitationsNo scripting, Standards fonts, Use of color

Optimized navigationMinimal navigation at the top of the pageAvoid lengthy URI’s

Probe images and coloursReduced image size and resolution, good contrast

Do it smallSmall pages, only one-direction (vertical) scrolling, easy entry forms (reduced keytyping)

Limited use of networkNo external links (images…), no download

Think in usersSimple language, relevant and limited content, error messages

And many more…!!!18

Visualization on Mobile DevicesSome best practices

Page 19: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

19

Visualization on Mobile DevicesHigh resolution interface

Page 20: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

20

Visualization on Mobile DevicesLow resolution interface

Page 21: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

21

Visualization on Mobile DevicesChecking ClusterMed

Page 22: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Our proposal obteins at his first version a 76%

Visualization on Mobile DevicesChecking our proposal

Page 23: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Cancer skin

23

Visualization on Mobile DevicesOur interface

Page 24: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

24

Page 25: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Conclusions

Objectives Fulfilled

It works!!!

First Milestone

A working prototype of an Information retrieval

System adapted for Medical Devices based on Post-

retrieval clustering

25

Conclusions and future workConclusions

Page 26: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

A lot of…

Focus on using medical ontologies like UMLS

Metathesarus for:

Improve the clustering quality

Enhance the labelling of the groups

Visual help, graph with concepts/cluster relationship

26

… and future work:

Conclusions and future workFuture work

Page 27: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

… and future work (and II):

Monodocument Summarization

Freebase Ajax connection for search support

Bilingual (Spanish-English documents)

Put into production (crawling and indexing…) any sponsor at the hall???

27

Conclusions and future workFuture work (and II)

Page 28: A Biomedical Information Retrieval System  based on Clustering for Mobile Devices

BioSEPLN10. Workshop on Language Technology applied to biomedical and health documents. Valencia, 6th september 2010

A Biomedical Information Retrieval System based on Clustering for Mobile Devices

Manuel de la VillaManuel J. Maña{manuel.villa, manuel.mana}@dti.uhu.es

Manuel MillánAlejandro Muñoz{manuel.millan, alejandro.munoz}@alu.uhu.es

28

This work has been partially funded by the Spanish Ministry of Science and Innovation and the European Union from the ERDF (TIN2009-14057-C03-03)