Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Evaluating different query reformulation techniques for the GIR
task considering geospatial entities as textual terms
José M. Perea-Ortega M.A. García-Cumbreras and L.A. Ureña-López
SINAI research group, Computer Science Department
University of Jaén (Spain)
Outline
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 2
Introduction and motivation Background Overview of a GIR architecture Evaluation framework: GeoCLEF Experiments and results Conclusions
Introduction
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 3
Geographical Information Retrieval (GIR) is concerned with the retrieval of documents according to two main criteria of relevance: Thematic Geographical
GIR is a multidisciplinary field
“Airplane crashes close to Russian cities”
< theme , spatial rel., location >
GIS
IR & NLP
Knowledge Management
GIR
Introduction
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 4
GIR vs. GIS GIS: exact spatial representations and complex analysis at the
level of individual spatial objects Structured information and unambiguous (expert users)
GIR: retrieving geo-referenced documents that may be relevant to a geographic query Unstructured information and ambiguous
GIR vs. IR GIR is considered an extension of the IR field
Motivation
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 5
GIR system can be treated as a traditional IR system Are there effective methods for query reformulation in GIR from a
NLP point of view? Can be IR query reformulation techniques applied in GIR?
Aim of the work: evaluate several query reformulations for the GIR task, considering that a GIR system can perform as an IR system QRs from a NLP perspective, combining both thematic and
geographical aspects, but always considering geospatial entities as textual terms
Background
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 6
Query Reformulation (QR): the process of altering a query in order to improve retrieval performance [Jansen et al. 09]
Geographic queries <theme, location> [Gravano et al. 03]: most of search engines ignore the
geographical scope of the queries Simple keywords matching approach Retrieving less relevant results
Background
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 7
QR approaches Term substitutions Relevance feedback Query expansion
[Kohler 03]: the addition of geo terms helps to differentiate between places that share the same name
[Cardoso et al. 07]: expansion based on the use of feature types [Fu et al. 05]: expansion based on a geographical ontology [Buscaldi et al. 05]: use WordNet by adding synonyms and holonyms [Stokes et al. 08]: all query concepts (not just geospatial ones) should
be expanded
Overview of a GIR architecture
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 8
Typical steps involved in GIR
Recognizing and disambiguating
geographical entities from document
collection
Geographical index
Geographical Knowledge
Base
Processing queries and using geographical scopes for retrieval
Offline processing
Online processing
Overview of a GIR architecture
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 9
SINAI-GIR: an example of a GIR system
Offline processing
Online processing
Overview of a GIR architecture
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 10
SINAI-GIR: an example of a GIR system
Evaluation framework: GeoCLEF
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 11
2005 – 2008 (CLEF conferences) Collection : 169,747 textual documents
Glasgow Herald (1995) Los Angeles Times (1994)
Queries: 100 textual topics (25 per year) Title (T), Description (D) and Narrative (N) “vegetable exporters of Europe” “forest fires in north of Portugal” “natural disasters in the Western USA”
Evaluation framework: GeoCLEF
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 12
Evaluation measures Relevance judgements + TREC evaluation method
Typical IR evaluation measures Mean Average Precision (MAP)
Recall (R)
Precision at n (P@n)
Experiments and results
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 13
QRs proposed in this work
QR1: only thematic part, discarding geographical part QR2: thematic expansion repeating its keywords (nouns) QR3: thematic expansion using synonyms of the keywords QR4: geographical expansion using synonyms QR5: geographical expansion using places that match with the
geographical scope of the query QR6: QR3 + QR5
Experiments and results
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 14
Example: QRs generated for the original query “visits of the American president to Germany”
QR Text of the query
Original visit American presid Germany
QR1 visit American presid
QR2 visit American presid visit American presid Germany
QR3 (visit | meet | stay) American presid Germany
QR4 visit American presid (Germany | Federal Republic of Germany | Deutschland | FRG)
QR5 visit American presid (Germany | Berlin | Hamburg | Muenchen | Koeln | Frankfurt | Essen)
QR6 (visit | meet | stay) American presid (Germany | Berlin | Hamburg | Muenchen | Koeln | Frankfurt | Essen)
Experiments and results
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 15
Topic Set QR P@10 R MAP
2005
Original 0.4560 0.8364 0.3514
QR2 0.4920 0.8276 0.3353
QR4 0.2800 0.6552 0.2242
2006
Original 0.1920 0.7288 0.2396
QR2 0.2040 0.6796 0.2314
QR4 0.1720 0.6984 0.2064
2007
Original 0.2560 0.7156 0.2311
QR2 0.2120 0.6656 0.1871
QR5 0.2000 0.6720 0.1874
2008
Original 0.2680 0.7368 0.2484
QR2 0.2680 0.7196 0.2381
QR6 0.2280 0.7028 0.2028
Experiments and results
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 16
Topic Set
Nº Total Rel. Docs
Nº Rel. Docs Orig Q.
2005 1,028 908 (~88%)
2006 378 284 (~75%)
2007 650 543 (~83%)
2008 747 588 (~78%)
Conclusions
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 17
QRs proposed seem to work well using GeoCLEF as evaluation framework Geographical query expansion
Synonyms of the geospatial scope detected in the query (QR4) Places or locations that match with the geospatial scope (QR5)
Thematic query expansion should be taken into account Repeat the keywords worked surprisingly well (QR2) Synonyms of the keywords sometimes might obtain good results Discarding the geographical part is not a good strategy (as expected)
Combination: thematic + geographical expansion (QR6)
May 29, 2012 GeoDoc'2012 - Kuala Lumpur, Malaysia 18
Thank you for your attention