Digital Libraries Issues& Geographic Information RetrievalUniversity of California, BerkeleySchool of Information Management and Systems SIMS 240: Principles of Information Retrieval
Mini-TRECProposed ScheduleFebruary 27 Database and previous QueriesMarch 6 report on system acquisition and setupMarch 18, New Queries for testingApril 29, Results dueMay 1, Results and system rankingsMay 6 & 8 Group reports and discussion
ReviewApplication of IR to Digital Library EnvironmentsImage Retrieval using BlobworldDerived from a paper presented at the 1999 ASIS Annual Meeting
TodayMore on Digital LibrariesDemo of DL search and featuresGeographic Information RetrievalParts of this this lecture were presented at the invitational conference The I in Geographic Information Science, Manchester, U.K., July 2001.
User Interface Paradigms: Multivalent Documents An approach to new document types and their authoring. Supports active, distributed, composable transformations of multimedia documents. Enables sophisticated annotations, intelligent result handling, user-modifiable interface, composite documents.
Image Retrieval ResearchFinding Stuff vs ThingsBlobWorld
Overview of Cheshire IIThe Cheshire II system is intended to provide an easy-to-use, standards-compliant system capable of retrieving any type of information in a wide variety of settings.
Cheshire II Searching
GIS in the MVD FrameworkLayers are georeferenced data sets.Behaviors aredisplay semi-transparentlypanzoomissue querydisplay contextspatial hyperlinksannotationsWritten in Java
GIS Viewer Example http://elib.cs.berkeley.edu/annotations/gis/buildings.html
Geographic Information Retrieval and Spatial BrowsingRay R. LarsonSchool of Library and Information StudiesUniversity of California, Berkeley
Concerns for Digital LibrariesExcellent summary in Distributed Geolibraries from NRC.Distributed resourcesDistributed usersDistributed servicesAccess for a broad population is critical for many Digital Libraries
Concerns for Digital LibrariesGeoreferenced Information (geoinformation) provides one organizational perspectiveOther common perspectives include Topical Classification schemes, Temporal/Historical organization (ECAI)DLs can provide multiple views of the same information
Concerns for Digital LibrariesMost DLs are intended for a broad user base:varying levels of expertise in the contentsvarying requirements for access methodssimple expressions of interest in natural language should be supportedMapping NL to controlled vocabularies (including Digital Gazetteers)
Digital Library NeedsGeographic and Spatial QueryingSpatial BrowsingGeographic and Spatial Indexing(Berkeley DL contents and examples)
Overview What is Geographic Information Retrieval?Geographic and Spatial Querying and Browsing.Geographic and Spatial Indexing.Examples of GIR Systems and Geographically Indexed Information.
IntroductionWhat is Geographic Information Retrieval?GIR is concerned with providing access to georeferenced information sources. It includes all of the areas of traditional IR research with the addition of spatially and geographically oriented indexing and retrieval.It combines aspects of DBMS research, User Interface Research, GIS research, and Information Retrieval research.
Introduction The need for Geographic and Spatial Information Retrieval.Digital LibrariesSequoia 2000UC Berkeley NSF/NASA/ARPA Digital Library ProjectUC Santa Barbara Alexandria ProjectNSDI - National Spatial Data InfrastructureNext-Generation Online CatalogsCheshire II
Geographic and Spatial QueryingBoth imply querying on relationships within a particular coordinate systemSpatial querying is the more general term Can be defined as queries about the spatial relationships (intersection, containment, boundary, adjacency, proximity) of entities geometrically defined and located in space
Geographic and Spatial QueryingGeographical coordinates are geometric relationships (distance and direction can be measured on a continuous scale)E.g. 5.21 miles north of ChampaignSpatial relations may be both geometric and topological (spatially related but without measureable distance or absolute direction)E.g.: inside the city limitsleft side of Beckman Institute
Geographic and Spatial QueryingTypes of spatial queriesPoint-in-polygon : What do we have at this X,Y point?Region Queries : What do we have in this region?Which point encoded items lie within the regionWhat lines (borders, etc.) lie within or the cross the regionWhat areas overlap the region area
Geographic and Spatial QueryingTypes of spatial queries, cont.Distance and Buffer Zone QueriesWhat cities lie within 40 miles of the border of Northern and Southern Ireland?What wetlands lie within 50 miles of London?Path QueriesWhat is the shortest route from San Francisco to Los Angeles?
Geographic and Spatial QueryingTypes of spatial queries, cont.Multimedia Queries : Use non-map georeferenced information.What are the names of farmers affected by flooding in Monterey and Santa Cruz Counties?
Spatial BrowsingCombines ad hoc spatial querying with interactive displaysHyperMap conceptPseudo-HyperMaps
Spatial BrowsingAdvantages:May not need the accuracy of a full GISComprehensible searching metaphor for many materialsProblems:Clutter and differing scales.Requires good (and preferably accurate) geographical indexingAssumes that the user knows some geography
Geographic and Spatial IndexingTraditional geographic indexing involves using place names from LCSH and name authorities. These have some problems:Names are not uniqueThe places referred to change size, shape and names over timeSpelling variationsSome places are temporary conventions (study areas, etc.)
Digital GazetteersGeographic names are and will remain the primary Entry Vocabulary for DL spatial queries The gazetteer must support as many variant forms of the name as possibleIncluding temporal ranges for particular namesquerying must support spatial reasoning based on gazetteer and other geographic and temporal information in the system or accessible by network access
Geographic and Spatial IndexingGeographic coordinates have some advantages over names:They are persistent regardless of name, political boundary or other changesThe can be simply connected to spatial browsing interfaces and GIS data.They provide a consistent framework for GIR applications and spatial queries.However, the geographic extents and boundaries of entities also change over timeThis may be the primary interest of historical scholarship
Geographic and Spatial IndexingGIPSY: Automatic georeferencing of texts (Geographic Info Processing System)The work of Allison Woodruff and Christian Plaunt - Later DBMS-based version by Jolly Chen -- New version planned Designed to operate on the full text of documentsExtracts geographic terms and attempts to identify the coordinates of the places discussed in the text using a combination of evidence
Geographic and Spatial IndexingGIPSY cont.Used the USGS Geographic Names Information System (GNIS) and Geographic Information Retrieval and Analysis System (GIRAS) to associate names with coordinates of named places, geographic features and land use characteristics.
Geographic and Spatial IndexingGIPSY cont.Identified places are added as elevations with each place adding a weight based on its frequency in the text and database characteristicsThe resulting map is analysed to identify the most likely locations, and coordinates for those locations are extracted
Geographic and Spatial IndexingGIPSY Map OverlayThe proposed project is the construction of a new State Water Project facility, the coastal branch... by water purveyors of northern Santa Barbara County... delivering water to San Luis Obispo ...
Geographic and Spatial IndexingTo be useful for the range of cultural and humanities materials being collected in digital libraries, the GIPSY gazetteer mustSupport many different time ranges, location and boundary changesSupport synonymous and variant names with differing locations for the same entitySupport names in multiple languages, scripts and usages
ECAIThe Electronic Cultural Atlas Initiative is a collaboration between IT professionals and humanities scholarsECAI is developing a globally distributed spatio-temporal library of cultural and historical resources with a centralized metadata catalogue and a GIS viewerCurrently the ECAI consortium includes over 250 projects
ECAIProjects range from small works by individual scholars to large nationally and internationally funded efforts. E.g.:geography of Greco-Roman culture (Perseus project)toponym locations for over 300,000 images of Buddhist art and architectureSeals of the Sassanian Empirehistorical trade routes of Eurasiathe map of Hideyoshis invasion of Koreahistorical GIS projects for China, Great Britain, the United States, the Black Sea and Tibet
The Sasanian Empire
Opening shot of the Sasanian Empire ECAI project, showing a map with diverse resources, a timeline, and a menu of available map layers.
Users may zoom in to see resources that are only visible at a higher level of detail.
Spatial objects on the map are linked to a table of attributes, which may include any information about the objects. Note that this is a scholarly tool. By creating a name quality field, the author has noted that there is disagreement about the lo