18
Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline KOS in DLs what has been done what activities are planned the main groups involved the problems being faced

Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Embed Size (px)

Citation preview

Page 1: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

ADEPT KOS Activities

KOS = Knowledge Organization Systems

Outline KOS in DLs what has been done what activities are planned the main groups involved the problems being faced

Page 2: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

Digital Library ComponentsCATALOG

OF

METADATA

SERVICES

ACCESSING

ANALYZING

ARCHIVING

CATALOGING

DIGITIZING

RETRIEVING

SEARCHING

VISUALIZING

KNOWLEDGE ORGANIZATION SYSTEMS

AUTHORITY FILESCLASSIFICATION SYSTEMSCONCEPT SPACESDICTIONARIESGAZETTEERSGLOSSARIESONTOLOGIESSUBJECT HEADING SETSTHESAURI

DATA STORE

OF

OBJECTS

LibrariesCollections

Page 3: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

Digital Gazetteer Essentials

(controlled vocabulary)

•None of these elements are unique identifiers of a particular place

Page 4: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

KOS Generalization

Relationships

Label

TypeDefinition

Meaning

Navigation TranslationSense-making

Page 5: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

KOS: what has been done

Knowledge Base (KB) Gazetteers

ADL Gazetteer Content Standard XML Schema ADL Gazetteer Service Protocol ADL Gazetteer (4.2 million entries; two user interfaces) Prototype duplicate detection process In process development of a gazetteer ingest system

Thesauri ADL Feature Type Thesaurus ADL Thesaurus Protocol

Textual Geospatial Integration (TGI) Project High-level process design Initial results from experiment with GeoRef records

Page 6: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

TGI Service

PARSE

LOOKUP

ANALYZE

EVALUATE

text document

type thesaurusgazette

er

potential names, types, coordinates

gazetteer entries (known places)

ranked footprints and placenames

“best” name(s)

composite footprint

Page 7: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

Main applications of TGI

Query enhancement Placenames -> footprints and/or additional placenames Footprints -> placenames

Cataloging assistance Textual evidence -> footprint representing what the object

is “about”

Page 8: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

Structure and petrography of the schist of Skookum Gulch, Callahan-Yreka area, eastern Klamath Mountains, Northern California<key>blueschist | California | Callahan California | foliation | Klamath Mountains | melange | metamorphic rocks | Ordovician | Paleozoic | petrology | schists | Silurian | Siskiyou County California | Skookum Gulch | United States | Yreka California</key><ab>The schist of Skookum Gulch (SSG) is an informal name applied to a fault-bounded melange composed mainly of schistose metamorphic rocks and less abundant sedimentary and igneous rocks located in the eastern Klamath Mountains of Northern California. The SSG features outcrops of lawsonite+sodic amphibole blueschist and epidote+sodic amphibole rocks transitional to the greenschist facies. Isotopic dating indicates that the schist was metamorphosed during the Ordovician. The SSG is the oldest known Paleozoic blueschist-bearing melange in California and one of the oldest preserved blueschist terranes in North America. Tonalitic rocks associated with the schist have Early Cambrian ages and are among the oldest rocks yet dated within the Klamath Mountains. Field relations indicate that the schist of Skookum Gulch is a complex tectonic melange composed of metavolcanic, carbonate, and metasedimentary blocks and lenses of diverse sizes and shapes dispersed without apparent stratigraphic coherency in a sheared matrix of clastic to pelitic schist, metavolcanic schist, and discontinuous thin lenses of marble. Rocks of the matrix have been metamorphosed to chlorite-grade greenschist facies, whereas the blocks have been metamorphosed under a variety of pressure-temperature conditions. Some blocks have been feebly metamorphosed and retain features of the original protolith material; others have been thoroughly recrystallized under blueschist, transitional, and greenschist facies conditions. Blueschist blocks within the schistose matrix reveal six deformation events, (Dl-D6): four are folding events, and at least two are ductile and brittle shear deformations. One period of metamorphism under blueschist-facies conditions is recorded in the blueschist blocks. The blocks lack evidence of prograde, greenschist-facies overprinting. Schistose rocks of the matrix are less deformed than the blueschist blocks. Matrix schists show at least two phases of folding. The predominant foliation is the result of tranposition of an early foliation or compositional layering. Other deformations include kink folding, ductile shearing, and brittle fracturing. The polydeformed tectonic blocks are hypothesized to have been incorporated into the melange matrix along a system of faults and rotated into a preferred alignment with the pervasive foliation of the matrix during D3. Feebly deformed and metamorphosed blocks such as chert, marble, and tonalite were incorporated prior to the time of brittle shearing.</ab><coord>N410000N420000W1220000W1230000</coord>

Example GeoRef Record

Page 9: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

Lookup Example: Gazetteer

Place Name exact partial

Skookum Gulch 1 0Klamath Mountains 1 0Northern California 0 1California 1 492Callahan* 1 1Silurian 0 5Siskiyou County* 1 14United States 1 273Yreka* 1 12North America 0 8

*within footprint of California

Page 10: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

TGI Evaluation Example

Skookum Gluch Klamath Mountains

California

Callahan in California

Siskiyou County in California United States

Yreka in California

Additional placenames• Shasta Butte City • Yreka City • Thompson's Dry Diggings

• Eastern Klamath Mountains• Area of Callahan-Yreka• Skookum Gulch

Derived footprint

Page 11: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

The Light at the End of the Tunnel

You submit: a document (could be a query)

You get: geospatial location + placenames

– Best– Also-rans– Alternatives

You apply this output to your processes

Page 12: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

KOS: what activities are planned

Knowledge Base for ADEPT TGI

Computer processing of geoparsing output to derive estimated footprints for GeoRef records

Evaluate similarity of derived footprints to those assigned by GeoRef

Refine TGI process based on evaluation results Run additional textual objects through the TGI process Publish a TGI service specification

Page 13: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

KOS: what activities are planned

Gazetteers Duplicate detection and ingest software for gazetteers Augmentation of ADL Gazetteer with polygonal footprints Improved database and searching support for ADL

Gazetteer Growth of a network of distributed gazetteers Use of Gazetteer Protocol in ADL/ADEPT as basis for new

gazetteer client Proposal for ITR funding to support gazetteer research and

development Thesauri

Use the thesaurus protocol in an ADL/ADEPT client – e.g., to access the Feature Type Thesaurus from a Gazetteer client

Page 14: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

KOS: the main groups involved Knowledge Base

Knowledge Organization Team San Diego Supercomputer (SDSC) DLESE

TGI Terry Smith, Jim Frew, Linda Hill, Greg Janée Illinois Institute of Technology

Gazetteers Gazetteer Development Team ESRI ECAI University of Redlands, MSGIS program

Thesauri Greg Janée and Linda Hill USGS Gateway Vocabulary Team

Page 15: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

KOS: the problems being faced

KOS Integration of KOS as a class of objects into DL architecture

– User interface issues– Managing change through time for KOS and collections

Balance of effort between building actual content and building a suite of tools for use by others

Flexible, customizable tools for building KOS Establishing/implementing standards for KOS

structures/representations Handling data and queries in multiple languages and scripts Building time-related data (e.g., historical data in gazetteer

entries) & better presentation of time range searching in clients

Page 16: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

KOS: the problems being faced Gazetteers

Which model to follow: humongous centralized ADL gazetteer vs. distributed gazetteers?

Should we be building ingest systems to support building “personal” gazetteers entry-by-entry or ingesting blocks of gazetteer data from other sources or both?

Spatial data representation in gazetteers– Are bounding box generalizations ‘good enough’?– What is the processing cost for spatial matching using generalized

polygons that are more faithful to shape? ‘Qualified’ placenames

– How to provide administrative parent for unqualified placenames in gazetteer

Add type of relationship linking place to its ‘conventional’ administrative parent

Use ‘contained-in’ search operator to find the administrative entities containing the place

Page 17: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

KOS: the problems being faced

TGI Identifying causes of success and failure in automatic

footprint generation– Effect of density/frequency of spatial references in the text– Effect of the geoparsing process applied– Effect of analysis process that derives the best estimate– Effect of the quality of the gazetteer and feature type

thesaurus Value of set of additional placenames for text retrieval

Page 18: Alexandria Digital Library Project ADEPT Retreat November 2002 ADEPT KOS Activities KOS = Knowledge Organization Systems Outline o KOS in DLs o what has

Alexandria Digital Library Project

ADEPT Retreat November 2002

Related URLs

KOS as DL components Position paper for Classification Research workshop:

http://www.alexandria.ucsb.edu/~lhill/KOSpaper7-2-final.doc Knowledge Base Textual Geospatial Integration

Powerpoint presentation: http://nkos.slis.kent.edu/2002workshop/frew.ppt

Gazetteers ADL Gazetteer Development page:

http://www.alexandria.ucsb.edu/~lhill/adlgaz/ Thesauri

Gazetteer Service Protocol: http://www.alexandria.ucsb.edu/thesaurus/protocol/