21
Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan

Proximity-based Ranking of Biomedical Texts

  • Upload
    vine

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Proximity-based Ranking of Biomedical Texts. Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan. Outline. Research background Problem definition The proposed approach: PRE Empirical evaluation Conclusion. Research Background. - PowerPoint PPT Presentation

Citation preview

Page 1: Proximity-based Ranking of Biomedical Texts

Proximity-based Ranking of Biomedical

Texts

Rey-Long Liu* and Yi-Chih Huang*Dept. of Medical Informatics

Tzu Chi University

Taiwan

Page 2: Proximity-based Ranking of Biomedical Texts

Outline

• Research background

• Problem definition

• The proposed approach: PRE

• Empirical evaluation

• Conclusion

A Proximity-based Ranker Enhancer 2

Page 3: Proximity-based Ranking of Biomedical Texts

Research Background

A Proximity-based Ranker Enhancer 3

Page 4: Proximity-based Ranking of Biomedical Texts

Biomedical Information Need

• Biomedical research requires relevant evidences in the huge and ever-growing biomedical literature

• Retrieval of the evidences requires a system that – Accepts a natural language query for a biomedical

information need, and – Ranks relevant texts higher for access or processing

A Proximity-based Ranker Enhancer 4

Page 5: Proximity-based Ranking of Biomedical Texts

An Example Info Need

• Query: urinary tract infection, criteria for treatment and admission (from OHSUMED) – A disease as the target concept (i.e., urinary tract infection)

– Two concepts about the scenario of the information need (i.e., treatment and admission)

• Neither special nor related to any disease

A Proximity-based Ranker Enhancer 5

Page 6: Proximity-based Ranking of Biomedical Texts

Problem Definition

A Proximity-based Ranker Enhancer 6

Page 7: Proximity-based Ranking of Biomedical Texts

Goals• Explore how text rankers may be improved by

considering the completeness of query concepts appearing in a nearby area of the text being ranked

• Develop a technique PRE (Proximity-based Ranker Enhancer) that – Measures contextual completeness of query

concepts appearing in a nearby area in the text– Serves as a supplement to improve existing rankers

A Proximity-based Ranker Enhancer 7

Page 8: Proximity-based Ranking of Biomedical Texts

Related Work• Biomedical text ranking

– Using synonyms and considering diversity of passages, without considering term proximity

• Text ranking– Individual text scoring techniques (e.g., BM25)

and learning to rank techniques (e.g., Ranking SVM), without considering term proximity

• Improving ranking by term proximity– Term proximity is employed, but contextual

completeness was not consideredA Proximity-based Ranker Enhancer 8

Page 9: Proximity-based Ranking of Biomedical Texts

The Proposed Approach: PRE

A Proximity-based Ranker Enhancer 9

Page 10: Proximity-based Ranking of Biomedical Texts

System Overview

A Proximity-based Ranker Enhancer 10

Text Ranker Development

TrainingTesting

Underlying RankerPRE

Text Ranking TF in d

User

Query (q)

Text (d)

TF (Term Frequency) Assessment

Training Data

Ranked Texts

Page 11: Proximity-based Ranking of Biomedical Texts

TF Assessment

A Proximity-based Ranker Enhancer 11

• Three types of term proximity– Overall proximity (QTermTF)– Individual proximity (IndiP)– Collective proximity (CollP)

• A term t may get a large TF increment in d, if – Many query terms appear frequently in d– Query terms are individually near to t at some

places, and– Query terms collectively appear at a place near to t

Page 12: Proximity-based Ranking of Biomedical Texts

A Proximity-based Ranker Enhancer 12

•RTF(t,d,q) = TF(t,d)+TFincrement(t,d,q)•TFincrement(t,d,q) = QtermTF(d,q)IndiP(t,d,q)×CollP(t,d,q)•QtermTF(d,q) = Total TF of query terms in d•IndiP(t,d,q) =ΣmM -

{t}SigmoidWeight(Mindist(t,m))/ MaxIndiP•Mindist(x,y) = shortest distance between x and y in d•SigmoidWeight(dt) = 1/(1+e-((|q|-1)-dt))•CollP(t,d,q) = MaxkK{mM - {t}

SigmoidWeight(dist(t,k,m))}/MaxCollP, where K is the set positions at which t appears in d•dist(t,k,m) = Distance between t (at position k) and m

Page 13: Proximity-based Ranking of Biomedical Texts

Empirical Evaluation

A Proximity-based Ranker Enhancer 13

Page 14: Proximity-based Ranking of Biomedical Texts

Experimental Data• OHSUMED

– A popular database of biomedical queries and references

– 106 queries– 348,566 references– 16,140 query-reference pairs

• Definitively relevant• Possibly relevant• Not relevant

A Proximity-based Ranker Enhancer 14

Page 15: Proximity-based Ranking of Biomedical Texts

Underlying Rankers

A Proximity-based Ranker Enhancer 15

Page 16: Proximity-based Ranking of Biomedical Texts

Baseline Ranker Enhancer

• Three state-of-the-art techniques that enhanced text rankers by term proximity– The t-function

• t() by [Tao & Zhai, 2007]

– The p-function • p() by [Cummins & O’Riordan, 2009]

– The proximity language model • PLM by [Zhao & Yun, 2009].

A Proximity-based Ranker Enhancer 16

Page 17: Proximity-based Ranking of Biomedical Texts

Evaluation Criteria

• Evaluating how relevant references are ranked higher for users to access– Mean average precision (MAP)– Normalized discount cumulative gain at x

(NDCG@X)

A Proximity-based Ranker Enhancer 17

Page 18: Proximity-based Ranking of Biomedical Texts

Results

A Proximity-based Ranker Enhancer 18

Page 19: Proximity-based Ranking of Biomedical Texts

A Proximity-based Ranker Enhancer 19

Page 20: Proximity-based Ranking of Biomedical Texts

Conclusion

A Proximity-based Ranker Enhancer 20

Page 21: Proximity-based Ranking of Biomedical Texts

• Term proximity may be comprehensively applied to improving various kinds of text rankers

• It is helpful to integrate three types of term proximity– Overall proximity– Individual proximity– Collective proximity

• Term proximity information may be encoded to re-assess TF of each term

A Proximity-based Ranker Enhancer 21