18
Evaluating different query reformulation techniques for the GIR task considering geospatial entities as textual terms José M. Perea-Ortega M.A. García-Cumbreras and L.A. Ureña-López SINAI research group, Computer Science Department University of Jaén (Spain)

Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Evaluating different query reformulation techniques for the GIR

task considering geospatial entities as textual terms

José M. Perea-Ortega M.A. García-Cumbreras and L.A. Ureña-López

SINAI research group, Computer Science Department

University of Jaén (Spain)

Page 2: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Outline

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 2

  Introduction and motivation   Background   Overview of a GIR architecture   Evaluation framework: GeoCLEF   Experiments and results   Conclusions

Page 3: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Introduction

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 3

  Geographical Information Retrieval (GIR) is concerned with the retrieval of documents according to two main criteria of relevance:   Thematic   Geographical

  GIR is a multidisciplinary field

“Airplane crashes close to Russian cities”

< theme , spatial rel., location >

GIS

IR & NLP

Knowledge Management

GIR

Page 4: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Introduction

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 4

  GIR vs. GIS   GIS: exact spatial representations and complex analysis at the

level of individual spatial objects   Structured information and unambiguous (expert users)

  GIR: retrieving geo-referenced documents that may be relevant to a geographic query   Unstructured information and ambiguous

  GIR vs. IR   GIR is considered an extension of the IR field

Page 5: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Motivation

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 5

  GIR system can be treated as a traditional IR system   Are there effective methods for query reformulation in GIR from a

NLP point of view?   Can be IR query reformulation techniques applied in GIR?

  Aim of the work: evaluate several query reformulations for the GIR task, considering that a GIR system can perform as an IR system   QRs from a NLP perspective, combining both thematic and

geographical aspects, but always considering geospatial entities as textual terms

Page 6: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Background

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 6

  Query Reformulation (QR): the process of altering a query in order to improve retrieval performance [Jansen et al. 09]

  Geographic queries <theme, location>   [Gravano et al. 03]: most of search engines ignore the

geographical scope of the queries   Simple keywords matching approach   Retrieving less relevant results

Page 7: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Background

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 7

  QR approaches   Term substitutions   Relevance feedback   Query expansion

  [Kohler 03]: the addition of geo terms helps to differentiate between places that share the same name

  [Cardoso et al. 07]: expansion based on the use of feature types   [Fu et al. 05]: expansion based on a geographical ontology   [Buscaldi et al. 05]: use WordNet by adding synonyms and holonyms   [Stokes et al. 08]: all query concepts (not just geospatial ones) should

be expanded

Page 8: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Overview of a GIR architecture

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 8

  Typical steps involved in GIR

Recognizing and disambiguating

geographical entities from document

collection

Geographical index

Geographical Knowledge

Base

Processing queries and using geographical scopes for retrieval

Offline processing

Online processing

Page 9: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Overview of a GIR architecture

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 9

  SINAI-GIR: an example of a GIR system

Offline processing

Online processing

Page 10: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Overview of a GIR architecture

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 10

  SINAI-GIR: an example of a GIR system

Page 11: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Evaluation framework: GeoCLEF

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 11

  2005 – 2008 (CLEF conferences)   Collection : 169,747 textual documents

  Glasgow Herald (1995)   Los Angeles Times (1994)

  Queries: 100 textual topics (25 per year)   Title (T), Description (D) and Narrative (N)   “vegetable exporters of Europe”   “forest fires in north of Portugal”   “natural disasters in the Western USA”

Page 12: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Evaluation framework: GeoCLEF

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 12

 Evaluation measures  Relevance judgements + TREC evaluation method

 Typical IR evaluation measures  Mean Average Precision (MAP)

 Recall (R)

 Precision at n (P@n)

Page 13: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Experiments and results

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 13

  QRs proposed in this work

  QR1: only thematic part, discarding geographical part   QR2: thematic expansion repeating its keywords (nouns)   QR3: thematic expansion using synonyms of the keywords   QR4: geographical expansion using synonyms   QR5: geographical expansion using places that match with the

geographical scope of the query   QR6: QR3 + QR5

Page 14: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Experiments and results

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 14

  Example: QRs generated for the original query “visits of the American president to Germany”

QR Text of the query

Original visit American presid Germany

QR1 visit American presid

QR2 visit American presid visit American presid Germany

QR3 (visit | meet | stay) American presid Germany

QR4 visit American presid (Germany | Federal Republic of Germany | Deutschland | FRG)

QR5 visit American presid (Germany | Berlin | Hamburg | Muenchen | Koeln | Frankfurt | Essen)

QR6 (visit | meet | stay) American presid (Germany | Berlin | Hamburg | Muenchen | Koeln | Frankfurt | Essen)

Page 15: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Experiments and results

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 15

Topic Set QR P@10 R MAP

2005

Original 0.4560 0.8364 0.3514

QR2 0.4920 0.8276 0.3353

QR4 0.2800 0.6552 0.2242

2006

Original 0.1920 0.7288 0.2396

QR2 0.2040 0.6796 0.2314

QR4 0.1720 0.6984 0.2064

2007

Original 0.2560 0.7156 0.2311

QR2 0.2120 0.6656 0.1871

QR5 0.2000 0.6720 0.1874

2008

Original 0.2680 0.7368 0.2484

QR2 0.2680 0.7196 0.2381

QR6 0.2280 0.7028 0.2028

Page 16: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Experiments and results

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 16

Topic Set

Nº Total Rel. Docs

Nº Rel. Docs Orig Q.

2005 1,028 908 (~88%)

2006 378 284 (~75%)

2007 650 543 (~83%)

2008 747 588 (~78%)

Page 17: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

Conclusions

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 17

  QRs proposed seem to work well using GeoCLEF as evaluation framework   Geographical query expansion

 Synonyms of the geospatial scope detected in the query (QR4)  Places or locations that match with the geospatial scope (QR5)

  Thematic query expansion should be taken into account  Repeat the keywords worked surprisingly well (QR2)  Synonyms of the keywords sometimes might obtain good results  Discarding the geographical part is not a good strategy (as expected)

  Combination: thematic + geographical expansion (QR6)

Page 18: Evaluating different query reformulation techniques for the GIR … · 2018. 5. 4. · Introduction 3 GeoDoc'2012 - Kuala Lumpur, Malaysia May 29, 2012 Geographical Information Retrieval

May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 18

Thank you for your attention