50
3/20/2000 Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of Information Management and Systems SIMS 240: Principles of Information Retrieval

3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

Embed Size (px)

Citation preview

Page 1: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Digital Libraries – Issues& Geographic Information

RetrievalUniversity of California, Berkeley

School of Information Management and Systems

SIMS 240: Principles of Information Retrieval

Page 2: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Mini-TREC

• Proposed Schedule– February 27 – Database and previous Queries– March 6 – report on system acquisition and

setup– March 18, New Queries for testing…– April 29, Results due– May 1, Results and system rankings– May 6 & 8 Group reports and discussion

Page 3: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Review

• Application of IR to Digital Library Environments

• Image Retrieval using Blobworld• Derived from a paper presented at the 1999 ASIS

Annual Meeting

Page 4: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Today

• More on Digital Libraries– Demo of DL search and features

• Geographic Information Retrieval– Parts of this this lecture were presented at the

invitational conference “The ‘I’ in Geographic Information Science”, Manchester, U.K., July 2001.

Page 5: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

User Interface Paradigms: Multivalent Documents

• An approach to new document types and their authoring.

• Supports active, distributed, composable transformations of multimedia documents.

• Enables sophisticated annotations, intelligent result handling, user-modifiable interface, composite documents.

Page 6: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Multivalent DocumentsCheshire LayerCheshire Layer

OCR LayerOCR Mapping LayerHistory of The Classical World

The jsfj sjjhfjs jsjjjsjhfsjf sjhfjksh sshfjsfksfjk sjs jsjfs kjsjfkjsfhskjf sjfhjkshskjfhkjshfjkshjsfhkjshfjkskjfhsfhskjfksjflksjflksjflksfsjfksjfkjskfjskfjklsslkslfjlskfjklsfklkkkdsjksfksjfkskflk sjfjksfkjsfkjsfkjshf sjfsjfjksksfjksfjksjfkthsjir\\ksksfjksjfkksjkls’ksklsjfkskfksjjjhsjhuusfsjfkjs

Modernjsfj sjjhfjs jsjjjsjhfsjf sslfjksh sshfjsfksfjk sjs jsjfs kjsjfkjsfhskjf sjfhjkshskjfhkjshfjkshjsfhkjshfjkskjfhsfhskjfksjflksjflksjflksfsjfksjfkjskfjskfjklsslkslfjlskfjklsfklkkkdsj

GIS Layer

taksksh kdjjdkd kdjkdjkd kjsksksk kdkdk kdkd dkkskksksk jdjjdj clclc ldldl

taksksh kdjjdkd kdjkdjkd kjsksksk kdkdk kdkd dkkskksksk jdjjdj clclc ldldl

Table 1.

Table Layer

kdkdkdkdk Scanned

PageImage

Valence:2: The relativecapacity to unite,react, or interact(as with antigensor a biologicalsubstrate).

Webster’s 7th CollegiateDictionary

Network Protocols &Resources

Page 7: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Image Retrieval Research

• Finding “Stuff” vs “Things”

• BlobWorld

Page 8: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of
Page 9: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Overview of Cheshire II

• The Cheshire II system is intended to provide an easy-to-use, standards-compliant system capable of retrieving any type of information in a wide variety of settings.

Page 10: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Cheshire II Searching

Z39.50 Internet

ImagesScannedText

Local Remote

Z39.50

Z39.50

Z39.50

Page 11: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

GIS in the MVD Framework• Layers are georeferenced data sets.• Behaviors are

– display semi-transparently– pan– zoom– issue query– display context– “spatial hyperlinks”– annotations

• Written in Java

Page 12: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

GIS Viewer Example http://elib.cs.berkeley.edu/annotations/gis/buildings.html

Page 13: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic Information Retrieval and Spatial Browsing

Ray R. Larson

School of Library and Information StudiesSchool of Library and Information StudiesUniversity of California, BerkeleyUniversity of California, Berkeley

Page 14: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Concerns for Digital Libraries

• Excellent summary in Distributed Geolibraries from NRC.– Distributed resources– Distributed users– Distributed services

• Access for a broad population is critical for many Digital Libraries

Page 15: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Concerns for Digital Libraries

• Georeferenced Information (geoinformation) provides one organizational perspective

• Other common perspectives include Topical Classification schemes, Temporal/Historical organization (ECAI)

• DL’s can provide multiple views of the same information

Page 16: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Concerns for Digital Libraries

• Most DLs are intended for a broad user base:– varying levels of expertise in the contents– varying requirements for access methods– simple expressions of interest in natural

language should be supported– Mapping NL to controlled vocabularies

(including Digital Gazetteers)

Page 17: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Digital Library Needs

• Geographic and Spatial Querying

• Spatial Browsing

• Geographic and Spatial Indexing

• (Berkeley DL contents and examples)

Page 18: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Overview

• What is Geographic Information Retrieval?

• Geographic and Spatial Querying and Browsing.

• Geographic and Spatial Indexing.

• Examples of GIR Systems and Geographically Indexed Information.

Page 19: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Introduction

• What is Geographic Information Retrieval?– GIR is concerned with providing access to

georeferenced information sources. It includes all of the areas of traditional IR research with the addition of spatially and geographically oriented indexing and retrieval.

– It combines aspects of DBMS research, User Interface Research, GIS research, and Information Retrieval research.

Page 20: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Introduction • The need for Geographic and Spatial

Information Retrieval.– Digital Libraries

• Sequoia 2000• UC Berkeley NSF/NASA/ARPA Digital Library

Project• UC Santa Barbara Alexandria Project• NSDI - National Spatial Data Infrastructure

– Next-Generation Online Catalogs• Cheshire II

Page 21: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Querying

• Both imply querying on relationships within a particular coordinate system

• Spatial querying is the more general term

• Can be defined as queries about the spatial relationships (intersection, containment, boundary, adjacency, proximity) of entities geometrically defined and located in space

Page 22: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Querying

• Geographical coordinates are geometric relationships (distance and direction can be measured on a continuous scale)– E.g. “5.21 miles north

of Champaign”

• Spatial relations may be both geometric and topological (spatially related but without measureable distance or absolute direction)– E.g.: “inside the city

limits”– “left side of Beckman

Institute”

Page 23: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Querying

• Types of spatial queries– Point-in-polygon : “What do we

have at this X,Y point?”– Region Queries : “What do we have

in this region?”• Which point encoded items lie within

the region• What lines (borders, etc.) lie within or

the cross the region• What areas overlap the region area

YY

XX

Page 24: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Querying

• Types of spatial queries, cont.– Distance and Buffer Zone Queries

• What cities lie within 40 miles of the border of Northern and Southern Ireland?

• What wetlands lie within 50 miles of London?

– Path Queries• What is the shortest route from San

Francisco to Los Angeles?

Page 25: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Querying

• Types of spatial queries, cont.– Multimedia Queries : Use non-

map georeferenced information.• What are the names of farmers

affected by flooding in Monterey and Santa Cruz Counties?

p123p123p127p127

Page 26: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Spatial Browsing

• Combines ad hoc spatial querying with interactive displays

• HyperMap concept

• Pseudo-HyperMaps

Page 27: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Spatial Browsing

• Advantages:– May not need the accuracy of a full GIS

– Comprehensible searching metaphor for many materials

• Problems:– Clutter and differing scales.

– Requires good (and preferably accurate) geographical indexing

– Assumes that the user knows some geography

Page 28: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Indexing

• Traditional geographic indexing involves using place names from LCSH and name authorities. These have some problems:– Names are not unique– The places referred to change size, shape and

names over time– Spelling variations– Some places are temporary conventions (study

areas, etc.)

Page 29: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Digital Gazetteers

• Geographic names are and will remain the primary Entry Vocabulary for DL spatial queries – The gazetteer must support as many variant

forms of the name as possible• Including temporal ranges for particular names

– querying must support spatial reasoning based on gazetteer and other geographic and temporal information in the system or accessible by network access

Page 30: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Page 31: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Indexing

• Geographic coordinates have some advantages over names:– They are persistent regardless of name, political

boundary or other changes– The can be simply connected to spatial browsing

interfaces and GIS data.– They provide a consistent framework for GIR

applications and spatial queries.• However, the geographic extents and boundaries

of entities also change over time– This may be the primary interest of historical

scholarship

Page 32: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Indexing

• GIPSY: Automatic georeferencing of texts (Geographic Info Processing System)– The work of Allison Woodruff and Christian Plaunt -

Later DBMS-based version by Jolly Chen -- New version planned

– Designed to operate on the full text of documents

– Extracts geographic terms and attempts to identify the coordinates of the places discussed in the text using a combination of evidence

Page 33: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Indexing

• GIPSY cont.– Used the USGS Geographic Names

Information System (GNIS) and Geographic Information Retrieval and Analysis System (GIRAS) to associate names with coordinates of named places, geographic features and land use characteristics.

Page 34: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Indexing

• GIPSY cont.– Identified places are added as “elevations” with

each place adding a weight based on its frequency in the text and database characteristics

– The resulting map is analysed to identify the most likely locations, and coordinates for those locations are extracted

Page 35: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Indexing

• GIPSY Map Overlay

““The proposed project isThe proposed project is the construction of a new State the construction of a new State Water Project facility, the Water Project facility, the coastal branch... by water coastal branch... by water purveyors of northern Santa purveyors of northern Santa Barbara County... delivering Barbara County... delivering water to San Luis Obispo ... “water to San Luis Obispo ... “

““The proposed project isThe proposed project is the construction of a new State the construction of a new State Water Project facility, the Water Project facility, the coastal branch... by water coastal branch... by water purveyors of northern Santa purveyors of northern Santa Barbara County... delivering Barbara County... delivering water to San Luis Obispo ... “water to San Luis Obispo ... “

Page 36: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Geographic and Spatial Indexing

• To be useful for the range of cultural and humanities materials being collected in digital libraries, the GIPSY gazetteer must– Support many different time ranges, location

and boundary changes– Support synonymous and variant names with

differing locations for the same entity– Support names in multiple languages, scripts

and usages

Page 37: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

ECAI

• The Electronic Cultural Atlas Initiative is a collaboration between IT professionals and humanities scholars

• ECAI is developing a globally distributed spatio-temporal library of cultural and historical resources with a centralized metadata catalogue and a GIS viewer

• Currently the ECAI consortium includes over 250 projects

Page 38: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

ECAI

• Projects range from small works by individual scholars to large nationally and internationally funded efforts. E.g.:– geography of Greco-Roman culture (Perseus project)– toponym locations for over 300,000 images of Buddhist

art and architecture– Seals of the Sassanian Empire– historical trade routes of Eurasia– the map of Hideyoshi’s invasion of Korea– historical GIS projects for China, Great Britain, the

United States, the Black Sea and Tibet

Page 39: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Perseus

Page 40: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

The Sasanian Empire

Page 41: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Opening shot of the Sasanian Empire ECAI project, showing a map with diverse resources, a timeline, and a menu of available map layers.

Page 42: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Users may zoom in to see resources that are only visible at a higher level of detail.

Page 43: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Spatial objects on the map are linked to a table of attributes, which may include any information about the objects. Note that this is a scholarly tool. By creating a “name quality” field, the author has noted that there is disagreement about the locations and names of places in the Sasanian Empire.

Page 44: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Sites on the map may be linked to resources elsewhere on the internet. In this case, important archaeological sites on the map are linked to web-based tours.

Page 45: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

The map interface may be used to show change over time. The “Sasanian Empire ca. 270s” resource is highlighted, and the “Sasanian Empire ca. 570s” is greyed out. If a user slides the timeline bar, the new boundary of the empire will appear.

Page 46: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

In a different time range, not only do the boundaries of the empire appear different, but the sites that were active during the earlier era (the red dots) have moved as well.

Page 47: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

TimeMap is a user authoring tool, not merely a viewer. Users can control the look of the icons, the map layers that comprise a project, and, as shown here, the map scale at which different layers will become visible.

Page 48: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

This screen displays the metadata for the a part of the Sasanian Empire project. The metadata includes functional (tm.) metadata to enable connection to the map interface in addition to cataloguing (dc. and ecai.) metadata. Using the menu on the left, users may choose to map individual map layers or packaged projects.

Page 49: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

Historic Sydney

Page 50: 3/20/2000Principles of Information Retrieval Digital Libraries – Issues & Geographic Information Retrieval University of California, Berkeley School of

3/20/2000 Principles of Information Retrieval

The Mongol Empire