Upload
bennett-patterson
View
214
Download
0
Embed Size (px)
Citation preview
Alexandria Digital Library Project
Goals and Challengesin
Georeferenced Digital Libraries
Greg Janée
Alexandria Digital Library Project
2
Goals
Digital library: “an integrated set of services for capturing, cataloging,
storing, searching, protecting, and retrieving information”
ADL: a lightweight, distributed digital library for heterogeneous,
georeferenced information a system and an infrastructure
– supports personal collections ... institutions– provides interoperability across spatial data providers
Alexandria Digital Library Project
3
Adjectives
Heterogeneous remotely-sensed imagery; textual documents multimedia instructional materials; executable models gazetteer placenames
Georeferenced generalizes to “scientific data”: any highly-structured, metadata-
rich information
Distributed for scalability
Lightweight accommodate small, cheap (i.e., free) implementations include non-traditional spatial data sources
Alexandria Digital Library Project
4
Where we are today
Downloadable server software, two clients
In operational use by MIL
Other (potential) users: Bren/ESSW Scripps DLESE Norwegian National Library Auckland University of Technology
Alexandria Digital Library Project
5
Challenges
Discovery Gazetteers Ranking Scalability Context Client integration
More at http://www.dlib.org/
Alexandria Digital Library Project
6
Challenge 1: discovery
Can’t beat word search when it works I want a map of Boulder “Downtown street map of Boulder, Colorado”
But there are so many names for a place... Boulder, Arapahoe County, Colorado Chautauqua, Mapleton Hill, Pearl Street Mall Area code 303, ZIP code 80305, UTM grid 13S Flatirons, Rocky Mountains, Front Range Landers earthquake, hurricane Hugo
Alexandria Digital Library Project
7
If you’re still not convinced...
Remote-sensing imagery is nameless “AVHRR NOAA-13 2002-06-03 14:33 UTC”
Challenge: exactly which two words will find a USGS map of the Flatirons in the Rocky Mountains behind Boulder, Arapahoe County, Colorado?
Eldorado Springs
Alexandria Digital Library Project
8
ADL approach
Coordinate-based representation and discovery generic lat/lon coordinates rich geometry
– polygons, polylines spatial operators
– overlaps, contains
Gazetteer defines representation of
places maps placenames
coordinates
client
gazetteer
library
coordinates
placenames
Alexandria Digital Library Project
9
Challenge 2: gazetteers: necessary evil
Few (public) sources of gazetteer data
Lousy quality digitized from maps
Difficult problems conflation classification boundary determination change over time
Conclusion gazetteer-based spatial reasoning seems unlikely interaction will likely remain client-centric
Alexandria Digital Library Project
10
Final thoughts on discovery
Coordinate-based approach is costly burden on users and catalogers limits potential collections relies on gazetteer’s weakest aspect: footprints continuous coordinate space adds complexity
Gazetteer improvements federated gazetteers new gazetteer models: topological as opposed to metric
Other coordinate spaces, grids, etc.
Alexandria Digital Library Project
11
Challenge 3: ranking
Observed phenomenon: World Map is first result of every query
Idea: rank by spatial similarity to query region
query
13
2
4
Alexandria Digital Library Project
12
Challenge 4: scalability
Easy to accumulate lots of data satellites image continuously 1 m resolution, Earth’s surface area = 5 1014 m2
Support for scalability text: amazingly good spatial: not so good
– indexing becomes unwieldy at 106 items combining spatial with other constraint types is difficult
Alexandria Digital Library Project
13
ADL approach
Partition and distribute the problem
Multiple levels of discovery find relevant collections search just those collections
Support multiple implementation strategies spatial engine relational database home-grown
Alexandria Digital Library Project
14
Challenge 5: context
Context is critical for evaluation Textual context:
poem
software
Alexandria Digital Library Project
15
Geospatial context
Does this answer your question?
Flatirons 1-5
Flagstaff Rd.
Green Mountain
Alexandria Digital Library Project
16
Challenge 6: client integration
“Click here” approach places large burden on users navigate interpret evaluate download
Service-based access will become predominant just as the WWW replaced FTP
Needed: description/access standards, protocols integration with search constraints