21
Extracting Metadata for Spatially-Aware Information Retrieval on the Internet Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)

Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

  • Upload
    keegan

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Research Paper Presentation – CS572 Summer 2011. Extracting Metadata for Spatially-Aware Information Retrieval on the Internet. Paper by Paul Clough (University of Sheffield Western Bank). Presented by Donghee Sung. Short Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Extracting Metadata for Spatially-Aware Information Retrieval on

the Internet

Research Paper Presentation – CS572 Summer 2011

Presented by Donghee Sung

Paper by Paul Clough (University of Sheffield Western

Bank)

Page 2: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Short Overview

SPIRIT:Spatial awareness to information systems

e.g.transport timetablesrouting system for motoristsmap-based web siteslocation based services

Key Part:Extraction and use of geospatial informa-tion

Page 3: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Short Overview

CriteriaSpeed, Reliability, Flexibility, Multilingualism

Geo-Parsing: - Identifying geographic references- Gazetteer lookup with context rules to filter out common-usage words and personal names

Geo-Coding: - Assigning spatial coordinate- Based on information of geographic resource

Page 4: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

What’s the SPIRIT?

< http://www.geo-spirit.org/ >

Page 5: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

What’s the SPIRIT?

SPIRITSPatially-Aware Information Retrieval on the InterneT

A search engineto find documents and datasets on the web relating to place or regions

Page 6: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

What’s the SPIRIT?

Poor existing web search facilities find information related to a particular lo-cation.

Vicinity: find other places within radiuswww.somewherenear.comYellow pages services:

find a specific place or post codeBuyukkten:

associated admin’s IP with telephone area code Stanford Research Institute:

proposed ‘.geo’ with cells with  latitude and longitude

Page 7: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

What’s the SPIRIT?

Resources relating to placemay not be foundmay not be places nearbymay have another name

Major Shortcoming:cannot recognize alternative name

modern/historical variantsinformal namecontained places name

Page 8: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

What’s the SPIRIT?

SPIRIT ProjectQuery expansion / relevance ranking pro-ceduresMachine learning techniques

extraction of geographical context generating metadata

Multi-modal user interface textual inputinteractive map feedback

Spatial indices for web collections.

Page 9: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Data Sources

Sources of Spatial DataTGN, OS, SABE

A large web collection of SPIRIT

Page 10: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Data Sources

Page 11: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Data Sources

Page 12: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Data Sources

Page 13: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Geo-Parsing Techniques

Tokenization IssuesStop-wordsNamed-Entitiy Recognition (NER)Gazetteers

Page 14: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Geo-Parsing Techniques

Page 15: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Geo-Parsing Techniques

Named-Entity Recognition (NER)

Processing a text and identifying to par-ticular categories of Named Entities(NE)

People, Organization, Location. etc

Page 16: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Geo-Parsing Techniques

Tokenization Procedure

1) Tokenized on whitespace @words = split(/s+/, $sentence);(Perl Regular Expressions)

"Isn't it ashame.“ -> Isn't / it / ashame.

2) Stemming / Case conversion.isn't / it / asham

3) Removing stop-words

Page 17: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Geo-Parsing Techniques

Default setting in indexing and re-trieving

- Case sensitivity: Off - Stop-word removal: Off- Stemming: Off

Stop-word removal / stemming-> Reduce the size of index files

But, can be useful:Stop-words : ‘in’, ‘inside’, or ‘of’Stemming: “London” from “London” &“Lon-doner”.

Page 18: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Geo-Parsing Techniques

Filtering candidate locations using context rules to remove

stop-wordsreferences to people and organiza-

tions, and links to emails/URLs

Page 19: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Conclusion

Geo-Parsing method could be improved by enhancing the gazetteer matching and filtering

False hits would be reduced by generating better list of stop-words and using further context rules could reduce

Need for creating rules would be alleviateby generating further context rules with fea-tures on machine learning

Page 20: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

References

[3] Jones C.B., R. Purves, A. Ruas, M. Sanderson, M. Sester, M.J.van Kreveld, R. Weibel (2002). Spatial information retrieval andgeographical ontologies an overview of the SPIRIT project.SIGIR 2002: In SIGI’02, Tampere, Finland, 387-388.

[6] Joho, H. and Sanderson, M. (2004) The SPIRIT collection: anoverview of a large web collection. In SIGIR Forum, 38(2), 57-61.

[8] Mikheev A., Moens M. and Grover C. (1999) Named Entityrecognition without gazetteers. In Proceedings of the AnnualMeeting of the European Association for ComputationalLinguistics EACL'99, Bergen, Norway, 1-8.

Spatially-Aware Information Retrieval on the Internet - A Working Searching System

Page 21: Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

Thank You!